k-nearest-neighbors caching is broken (unsound) #2082

wchargin · 2019-04-01T23:53:12Z

Missed this in my review of #1901. The cache key check is incorrect. The
code reads:

tensorboard/tensorboard/plugins/projector/vz_projector/data.ts

Lines 444 to 447 in 9270699

    
           private async computeKnn( 
        
             data: DataPoint[],  
        
             nNeighbors: number): Promise<knn.NearestEntry[][]> { 
        
             if (this.nearest != null && nNeighbors <= this.nearest.length) {

where the nNeighbors <= this.nearest.length is intended to check
whether the cached KNN computation was computed with a value of
nNeighbors not smaller than the currently requested value. But in fact
this.nearest.length is the number of data points, not the number of
neighbors computed for each point.

To verify, add

    console.log(
        `Found ${nNeighbors}-nearest: shape is ` +
        `(${this.nearest.length}, ${this.nearest[0].length})`
    );

below line 384 in projectUmap. Then, run UMAP, and subsequently re-run
UMAP with a higher value of k. The output is (e.g.)

Found 15-nearest: shape is (1024, 15)
Found 23-nearest: shape is (1024, 15)

which is wrong, because after attempting to find the 23-nearest
neighbors we only have 15 elements for each data point.

I’m not sure why this never hits a hard error anywhere in the
pipeline—implicit conversion of undefined to 0/NaN somewhere?—but
it definitely causes observable effects. To observe, patch in #2080* to
prevent t-SNE from running automatically when its tab is selected, then:

in one tab, load the projector page and run t-SNE with
perplexity=100;
in another tab, load the projector page and run t-SNE with
perplexity=8, then re-run it with perplexity=100.

These two tabs should yield approximately the same projection, but
instead the projection in the first tab converges to a normal result
whereas the projection in the second tab diverges to points far apart
with little visible structure:

* Tested at commit b0310cd.

The text was updated successfully, but these errors were encountered:

wchargin · 2019-04-01T23:57:48Z

Assigning @cannoneyed.

shashvatshahi1998 · 2019-04-10T10:08:42Z

@wchargin can I work on this, just guide a little more about the issue.

cannoneyed · 2019-04-30T17:24:12Z

Hey @wchargin, now that everything is sorted on the rest of the UMAP end, I'm gonna take care of this. Hopefully I'll have a PR for you by EOD.

Fixes #2082.

wchargin added type:bug plugin:projector labels Apr 1, 2019

wchargin assigned cannoneyed Apr 2, 2019

cannoneyed mentioned this issue Apr 30, 2019

Fix issue where precomputed knn aren't being reused #2171

Merged

wchargin closed this as completed in #2171 May 22, 2019

wchargin pushed a commit that referenced this issue May 22, 2019

Fix issue where precomputed knn aren't being reused (#2171)

00495e1

Fixes #2082.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k-nearest-neighbors caching is broken (unsound) #2082

k-nearest-neighbors caching is broken (unsound) #2082

wchargin commented Apr 1, 2019

wchargin commented Apr 1, 2019

shashvatshahi1998 commented Apr 10, 2019

cannoneyed commented Apr 30, 2019

k-nearest-neighbors caching is broken (unsound) #2082

k-nearest-neighbors caching is broken (unsound) #2082

Comments

wchargin commented Apr 1, 2019

wchargin commented Apr 1, 2019

shashvatshahi1998 commented Apr 10, 2019

cannoneyed commented Apr 30, 2019