You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
K-means clustering is a clustering algorithm that aims to partition n observations into k clusters.
There are 3 steps:
Initialisation – K initial “means” (centroids) are generated at random
Assignment – K clusters are created by associating each observation with the nearest centroid
Update – The centroid of the clusters becomes the new mean
Assignment and Update are repeated iteratively until convergence
The end result is that the sum of squared errors is minimised between points and their respective centroids.
Some things to take note of though:
k-means clustering is very sensitive to scale due to its reliance on Euclidean distance so be sure to normalize data if there are likely to be scaling problems.
If there are some symmetries in your data, some of the labels may be mis-labelled
It is recommended to do the same k-means with different initial centroids and take the most common label.
The text was updated successfully, but these errors were encountered:
K-means clustering is a clustering algorithm that aims to partition n observations into k clusters.
There are 3 steps:
Initialisation – K initial “means” (centroids) are generated at random
Assignment – K clusters are created by associating each observation with the nearest centroid
Update – The centroid of the clusters becomes the new mean
Assignment and Update are repeated iteratively until convergence
The end result is that the sum of squared errors is minimised between points and their respective centroids.
Some things to take note of though:
k-means clustering is very sensitive to scale due to its reliance on Euclidean distance so be sure to normalize data if there are likely to be scaling problems.
If there are some symmetries in your data, some of the labels may be mis-labelled
It is recommended to do the same k-means with different initial centroids and take the most common label.
The text was updated successfully, but these errors were encountered: