Anomalous Traffic Identification Method for POST Messages Based on Gambling Website Templates
- College of information Engineering, Shanghai Maritime University
201306 Shanghai, China
fengzhimin@stu.shmtu.edu.cn - College of information Engineering, Shanghai Maritime University
201306 Shanghai, China
dzhan@shmtu.edu.cn - Network Security Center, The Third Research Institute of the Ministry of Public Security
200031 Shanghai, China
wusongyang@stars.org.cn - Network Security Center, The Third Research Institute of the Ministry of Public Security
200031 Shanghai, China
sunwenqi@gass.ac.cn - College of information Engineering, Shanghai Maritime University
201306 Shanghai, China
shishuxin@stu.shmtu.edu.cn
Abstract
Malicious websites pose significant social risks, necessitating automatic, efficient, and accurate identification methods. This paper proposes a POST traffic classification method based on website templates to identify abnormal traffic from gambling websites. Using Fiddler, POST message data is collected from several gambling sites, extracting features like URLs, cookie parameters, and request body parameters to create a Gambling Website Single POST Message Dataset (GSPD). These features are converted into vector representations with Word2Vec and TF-IDF techniques. Hierarchical clustering identifies template-generated types, achieving unsupervised template recognition. Using clustering results, individual POST messages are labeled and features are extracted using TF-IDF and mutual information methods. The parameters of a Support Vector Machine (SVM) are then optimized with the Particle Swarm Optimization (PSO) algorithm for optimal classification. Experimental results show the model’s excellent performance, with a test set accuracy of 0.9985 and high precision, recall, and F1-scores, effectively identifying gambling and other illegal websites.
Key words
Template recognition, Illegal Website Detectio, feature extraction, POST traffic classification
How to cite
Feng, Z., Han, D., Wu, S., Sun, W., Shi, S.: Anomalous Traffic Identification Method for POST Messages Based on Gambling Website Templates. Computer Science and Information Systems