A K-means algorithm based on characteristics of density applied to network intrusion detection

Jing Xu¹, Dezhi Han¹, Kuan-Ching Li² and Hai Jiang³

Shanghai Maritime University
Shanghai, 201306, China
Providence University
Taichung 43301, Taiwan
Arkansas State University
Jonesboro, Arkansas 72467, USA

Abstract

K-means algorithms are a group of popular unsupervised algorithms widely used for cluster analysis. However, the results of traditional K-means clustering algorithms are greatly affected by the initial clustering center, with unstable accuracy and low speed, which makes the algorithm hard to meet the requirements for Big Data. In this paper, a modernized version of the K-means algorithm based on density to select the initial seed of clustering is proposed. Firstly, Kd-tree is used to divide the hyper-rectangle space, so those points close to each other are grouped into the same sub-tree during data pre-processing, and the generalized information is stored in the tree structure. Besides, an improved Kd-tree nearest neighbor search is used in the K-means algorithm to prune the search space and optimize the operation for speedup. The clustering results show that the clusters are stable and accurate when the numbers of clusters and iterations are constant. Experimental results in the network intrusion detection case show that the improved version of the K-means algorithms performs better in terms of detection rate and false rate.

Key words

Network security; K-means; Kd-tree; Network intrusion detection

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS200406014X

Publication information

Volume 17, Issue 2 (June 2020)
Year of Publication: 2020
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

Download Available in PDF
Portable Document Format

How to cite

Xu, J., Han, D., Li, K., Jiang, H.: A K-means algorithm based on characteristics of density applied to network intrusion detection. Computer Science and Information Systems, Vol. 17, No. 2, 665–687. (2020), https://doi.org/10.2298/CSIS200406014X