Ishika Mandloi

Jul 19, 2021

2 min read

K-means clustering and its real use case in security

Clustering is creating groups of similar data points. Datapoints in different clusters are very different.

K means is a clustering algorithm that focuses on grouping similar elements or data points into a cluster. K represents the number of clusters. The means refer to averaging of the data.

K-means clustering is an unsupervised learning algorithm as we don't have any fixed target variable. the main aim of the k-mean clustering algorithm is to minimize the sum of the distance between the data points and the centroid of the cluster.

Initially, we randomly select few data points, they are like clusters with a single data point, and the data points are considered as the centroid of the cluster. We’ll measure the distance between the data points and the selected clusters and calculate the mean. Data points will belong to the cluster that is nearest to it.

we’ll repeat these steps by selecting different data points. we stop iteration if the data points and the centroid of clusters remain the same.

K-means clustering can be used for many real-life use cases where we have to cluster similar items together.

Classifying network traffic:

As more and more services begin to use APIs on your application, or as your website grows, it is important you know where the traffic is coming from. For example, you want to be able to block harmful traffic. However, it is hard to know which is which when it comes to classifying the traffic.

K-means clustering is used to group together characteristics of the traffic sources. When the clusters are created, you can then classify the traffic types. The process is faster and accurate.

So, k-means can be used in many real-life cases, where we have a need to cluster data.

Thank you for reading ….