Applied Machine Learning in Recognition of DGA Domain Names
- SekuriPy LLC, Mirka Račkog 10
10360 Zagreb, Croatia
miroslav.stampar@sekuripy.hr - Faculty of Electrical Engineering and Computing, Unska 3
10000 Zagreb, Croatia
kresimir.fertalj@fer.hr
Abstract
Recognition of domain names generated by domain generation algorithms (DGAs) is the essential part of malware detection by inspection of network traffic. Besides basic heuristics (HE) and limited detection based on blacklists, the most promising course seems to be machine learning (ML). There is a lack of studies that extensively compare different ML models in the field of DGA binary classification, including both conventional and deep learning (DL) representatives. Also, those few that exist are either focused on a small set of models, use a poor set of features in ML models or fail to secure unbiased independence between training and evaluation samples. To overcome these limitations, we engineered a robust feature set, and accordingly trained and evaluated 14 ML, 9 DL, and 2 comparative models on two independent datasets. Results show that if ML features are properly engineered, there is a marginal difference in overall score between top ML and DL representatives. This paper represents the first attempt to neutrally compare the performance of many different models for the recognition of DGA domain names, where the best models perform as well as the top representatives from the literature.
Key words
domain generation algorithm, binary classification, supervised machine learning, deep learning, blind evaluation
Digital Object Identifier (DOI)
https://doi.org/10.2298/CSIS210104046S
Publication information
Volume 19, Issue 1 (January 2022)
Year of Publication: 2022
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium
Full text
Available in PDF
Portable Document Format
How to cite
Štampar, M., Fertalj, K.: Applied Machine Learning in Recognition of DGA Domain Names. Computer Science and Information Systems, Vol. 19, No. 1, 205-227. (2022), https://doi.org/10.2298/CSIS210104046S