Identification and Detection of Illegal Gambling Websites and Analysis of User Behavior

Zhimin Zhang1, Dezhi Han1, Songyang Wu2, Wenqi Sun2 and Shuxin Shi1

  1. College of Information Engineering, Shanghai Maritime University
    201306 Shanghai, China
    zhangzhimin@stu.shmtu.edu.cn, dzhan@shmtu.edu.cn, shishuxin@stu.shmtu.edu.cn
  2. Network Security Center, The Third Research Institute of the Ministry of Public Security
    200031 Shanghai, China
    wusongyang@stars.org.cn, sunwenqi@gass.ac.cn

Abstract

Illegal gambling websites use advanced technology to evade regulations, posing cybersecurity challenges. To address this, we propose a machine learning method to identify these sites and analyze user behavior accurately. The method extracts key data from post messages in a real-world network environment, generating word vectors via Word2Vec with TF-IDF, which are then downscaled and feature-extracted using a Stacked Denoising Auto Encoder (SDAE). Next, this paper uses Agglomerative Clustering, improved through a combination of distance caching and heap optimization, to initially cluster post-template websites of the same type by clustering them into the same cluster. Then, multiple algorithms are integrated within each website cluster to cluster users’ different operational behaviors into different clusters based on the cosine similarity consensus function voting secondary clustering. Results show improved detection of illegal gambling sites and classification of user activities, offering new insights for combating these sites.

Key words

Gambling websites, post messages, feature extraction, illegal website identification, cluster analysis

How to cite

Zhang, Z., Han, D., Wu, S., Sun, W., Shi, S.: Identification and Detection of Illegal Gambling Websites and Analysis of User Behavior. Computer Science and Information Systems