Skip to content

How to increase the number of nodes sampled in training KMeans? #2563

Answered by kunal-kotian
namespace-Pt asked this question in Q&A
Discussion options

You must be logged in to vote

faiss.Kmeans has a property max_points_per_centroid which is set to 256 by default. With k clusters, this means only k * 256 datapoints can be used for fitting kmeans. In your case, this turns out to be 2560000 datapoints, which get subsampled from your full dataset. To use all 8M samples for fitting Kmeans, just pass max_points_per_centroid=800 to the faiss.Kmeans() constructor.

For reference, see this link: https://github.com/facebookresearch/faiss/wiki/FAQ#can-i-ignore-warning-clustering-xxx-points-to-yyy-centroids

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by namespace-Pt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants