Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k-nearest-neighbors caching is broken (unsound) #2082

Closed
wchargin opened this issue Apr 1, 2019 · 3 comments · Fixed by #2171
Closed

k-nearest-neighbors caching is broken (unsound) #2082

wchargin opened this issue Apr 1, 2019 · 3 comments · Fixed by #2171

Comments

@wchargin
Copy link
Contributor

wchargin commented Apr 1, 2019

Missed this in my review of #1901. The cache key check is incorrect. The
code reads:

private async computeKnn(
data: DataPoint[],
nNeighbors: number): Promise<knn.NearestEntry[][]> {
if (this.nearest != null && nNeighbors <= this.nearest.length) {

where the nNeighbors <= this.nearest.length is intended to check
whether the cached KNN computation was computed with a value of
nNeighbors not smaller than the currently requested value. But in fact
this.nearest.length is the number of data points, not the number of
neighbors computed for each point.

To verify, add

    console.log(
        `Found ${nNeighbors}-nearest: shape is ` +
        `(${this.nearest.length}, ${this.nearest[0].length})`
    );

below line 384 in projectUmap. Then, run UMAP, and subsequently re-run
UMAP with a higher value of k. The output is (e.g.)

Found 15-nearest: shape is (1024, 15)
Found 23-nearest: shape is (1024, 15)

which is wrong, because after attempting to find the 23-nearest
neighbors we only have 15 elements for each data point.

I’m not sure why this never hits a hard error anywhere in the
pipeline—implicit conversion of undefined to 0/NaN somewhere?—but
it definitely causes observable effects. To observe, patch in #2080* to
prevent t-SNE from running automatically when its tab is selected, then:

  • in one tab, load the projector page and run t-SNE with
    perplexity=100;

  • in another tab, load the projector page and run t-SNE with
    perplexity=8, then re-run it with perplexity=100.

These two tabs should yield approximately the same projection, but
instead the projection in the first tab converges to a normal result
whereas the projection in the second tab diverges to points far apart
with little visible structure:

Screenshot of t-SNE okay as described above

Screenshot of t-SNE diverged as described above

* Tested at commit b0310cd.

@wchargin
Copy link
Contributor Author

wchargin commented Apr 1, 2019

Assigning @cannoneyed.

@shashvatshahi1998
Copy link
Contributor

@wchargin can I work on this, just guide a little more about the issue.

@cannoneyed
Copy link
Contributor

Hey @wchargin, now that everything is sorted on the rest of the UMAP end, I'm gonna take care of this. Hopefully I'll have a PR for you by EOD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants