A Simple Explanation of K-Means Clustering


Table Of Contents

  • Introduction
  • How does the K-means algorithm work?
  • How to choose the value of K?
  • Elbow Method.
  • Silhouette Method.
  • Advantages of k-means.
  • Disadvantages of k-means.


How Does the K-means clustering algorithm work?

  1. Select the k values.
  2. Initialize the centroids.
  3. Select the group and find the average.
  • Figure 1 shows the representation of data of two different items. the first item has shown in blue color and the second item has shown in red color. Here I am choosing the value of K randomly as 2. There are different methods by which we can choose the right k values.
  • In figure 2, Join the two selected points. Now to find out centroid, we will draw a perpendicular line to that line. The points will move to their centroid. If you will notice there, then you will see that some of the red points are now moved to the blue points. Now, these points belong to the group of blue color items.
  • The same process will continue in figure 3. we will join the two points and draw a perpendicular line to that and find out the centroid. Now the two points will move to its centroid and again some of the red points get converted to blue points.
  • The same process is happening in figure 4. This process will be continued until and unless we get two completely different clusters of these groups.

How to choose the value of K?

Elbow Method

Silhouette Method

Advantages of K-means

  1. It is very simple to implement.
  2. It is scalable to a huge data set and also faster to large datasets.
  3. it adapts the new examples very frequently.
  4. Generalization of clusters for different shapes and sizes.

Disadvantages of K-means

  1. It is sensitive to outliers.
  2. Choosing the k values manually is a tough job.
  3. As the number of dimensions increases its scalability decreases.




Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

LINEAR REGRESSION with Gradient Descent, The basic Machine Learning algorithm.

On App Annie, Alternative Data, and the SEC

Blog post 2

Sentiment Analysis using Pytorch

Why Two-Tailed T-Tests are the Best Thing Since Sliced Bread*

Machine learning and baseball pitch types classification

The ultimate guide to A/B testing. Part 2: Data distributions

Automating the hunt for illegal dumpsites in Turkey with satellite imagery

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aditya Kumar Pandey

Aditya Kumar Pandey

More from Medium

Multiple Regression Analysis for Predicting the 40 Yard Dash for NFL Prospects

Credit Card Clustering (K-Means)

Feature Scaling and Data Normalization for Machine Learning and Deep Learning using Tensorflow.

Credit Card Fraud Detection