EWU Institutional Repository

Cyber Threat Detection Using Machine Learning Algorithms On Heterogeneous VHS-22 Dataset

Show simple item record

dc.contributor.advisor Zahidur Rahman
dc.contributor.author Rahman, Faiaz
dc.contributor.author Tanna, Rafee Zunaied
dc.contributor.author Habiba, Umme
dc.date.accessioned 2023-01-31T09:58:41Z
dc.date.available 2023-01-31T09:58:41Z
dc.date.issued 2023-01-18
dc.identifier.uri http://dspace.ewubd.edu:8080/handle/123456789/3870
dc.description This thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Information and Communication Engineering of East West University, Dhaka, Bangladesh en_US
dc.description.abstract Large and varied amounts of data are needed for the research of emerging machine learning (ML) techniques for detecting network threats, such as malware-related threats. The research community has been using a number of network traffic datasets that have been proposed in recent years. The majority of these datasets contain, however, only a few classes of bot and malware, lacking significant diversity and generalization to identify threats. In this work, we considered a heterogeneous dataset of 27.7 million data named VHS-22. This dataset contains flow parameters extracted using a software network probe from four datasets and a network traffic malware monitoring website. Our methodology evaluates different machine learning techniques and the ensemble classifiers. More than 99% of the threats associated with malware are successfully identified by the Bagging Decision Tree, Random Forest, Extremely Randomize Tree, Decision Tree, Histogram Based Gradient Boosting etc. Additionally, we constructed a prototype dataset named MiniVHS-22 from the original VHS-22 dataset to reduce the computational burden for the future researchers on model training and evaluation. We calculated the ratio of normal and attack data in the original dataset and maintained the same ratio in the MiniVHS-22 dataset of 1M data and used different dimensionality reduction techniques such as the Principal Component Analysis (PCA), and Linear Discriminant Analysis (LDA) with varying numbers of principal component values on it and explained our analysis in result section. Sophisticated network traffic threat detection systems can be developed using the results of our investigation. en_US
dc.language.iso en_US en_US
dc.publisher East West University en_US
dc.relation.ispartofseries ;ECE00262
dc.subject Cyber Threat Detection, Emerging machine learning, techniques for detecting network threats, malware-related threats, Algorithms On Heterogeneous VHS-22 Dataset en_US
dc.title Cyber Threat Detection Using Machine Learning Algorithms On Heterogeneous VHS-22 Dataset en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account