Anomalous Traffic Identification Method for POST Messages Based on Gambling Website Templates

Zhimin Feng1, Dezhi Han2, Songyang Wu3, Wenqi Sun4 and Shuxin Shi5

  1. College of information Engineering, Shanghai Maritime University
    201306 Shanghai, China
    fengzhimin@stu.shmtu.edu.cn
  2. College of information Engineering, Shanghai Maritime University
    201306 Shanghai, China
    dzhan@shmtu.edu.cn
  3. Network Security Center, The Third Research Institute of the Ministry of Public Security
    200031 Shanghai, China
    wusongyang@stars.org.cn
  4. Network Security Center, The Third Research Institute of the Ministry of Public Security
    200031 Shanghai, China
    sunwenqi@gass.ac.cn
  5. College of information Engineering, Shanghai Maritime University
    201306 Shanghai, China
    shishuxin@stu.shmtu.edu.cn

Abstract

Malicious websites pose significant social risks, necessitating automatic, efficient, and accurate identification methods. This paper proposes a POST traffic classification method based on website templates to identify abnormal traffic from gambling websites. Using Fiddler, POST message data is collected from several gambling sites, extracting features like URLs, cookie parameters, and request body parameters to create a Gambling Website Single POST Message Dataset (GSPD). These features are converted into vector representations with Word2Vec and TF-IDF techniques. Hierarchical clustering identifies template-generated types, achieving unsupervised template recognition. Using clustering results, individual POST messages are labeled and features are extracted using TF-IDF and mutual information methods. The parameters of a Support Vector Machine (SVM) are then optimized with the Particle Swarm Optimization (PSO) algorithm for optimal classification. Experimental results show the model’s excellent performance, with a test set accuracy of 0.9985 and high precision, recall, and F1-scores, effectively identifying gambling and other illegal websites.

Key words

Template recognition, Illegal Website Detectio, feature extraction, POST traffic classification

How to cite

Feng, Z., Han, D., Wu, S., Sun, W., Shi, S.: Anomalous Traffic Identification Method for POST Messages Based on Gambling Website Templates. Computer Science and Information Systems