Degraded performance on zero-shot clustering #4

NBitBuilder · 2025-02-15T20:51:13Z

Thank you so much for creating the model and providing the weights for the community. Your contribution to this field is truly commendable!

I was able to reproduce the tutorial results, which suggests that I deployed the model correctly.

However, when I tested the embeddings using the HER2+ breast cancer dataset (a widely recognized benchmark with region annotations, as outlined here), the results were disappointing.

The embeddings performed poorly, yielding very low SI and Adjusted Rand Index (ARI) scores across all slides. As you can see in this table:

In comparison, I tested other baseline models, such as spatial transcriptomics-based methods (e.g., spaGCN) and H&E-based models (e.g., UNI-h, Virchow2), all in a zero-shot manner. The results were significantly better; for instance, Virchow2 demonstrated:

Even for models (all zero-shot), such as spaGCN, we have:

sample:H1, metric ARI:
Kmeans (0.264)
Louvain (0.218)
SpaGCN (0.319)
BayesSpace (0.374)

sample:G2, metric ARI:
KMeans on gene expression(0.171)
Louvain on gene expression (0.161)
SpaGCN without image (0.126)
BayesSpace without image (0.173)

Given that other users also expressed concerns about the model's performance (#3 (comment)), perhaps the unexpected results require a more comprehensive evaluation by the authors.

For your reference to reproduce my results, I attached the embedding of scGPT spatial and Virchow2 here:

vrichow2.zip
scgpt.zip

The text was updated successfully, but these errors were encountered:

ChloeXWang · 2025-02-17T21:32:03Z

Hi @NBitBuilder, thank you for using our methods!

Please see related replies in #3 comment. To summarize, we will release a finetuning tutorial for Visium brain slides in the next few weeks. The finetuning workflow is unsupervised and hence comparable to specialized methods such as SpaGCN. Please use the output from finetuned models instead and it should yield reasonable results.

Hope this helps clarify your questions.

NBitBuilder · 2025-02-17T22:55:15Z

Thank you for your explanation. I will definitely try your new weights once they are released.

In the meantime, could you please clarify the differences between the current pretrained model and the newly released fine-tuned model? How do they differ in terms of training, and under what conditions should I use the pretrained weights versus the fine-tuned weights?

NBitBuilder · 2025-02-17T22:59:45Z

You mentioned that "we will release a finetuning tutorial for Visium brain slides in the next few weeks." Will the breast HER2 dataset also require a separate fine-tuned model for Visium breast slides? If I would be wrong, I don't believe this qualifies as a foundation model since it creates individual fine-tuning models for each dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Degraded performance on zero-shot clustering #4

Degraded performance on zero-shot clustering #4

NBitBuilder commented Feb 15, 2025

ChloeXWang commented Feb 17, 2025

NBitBuilder commented Feb 17, 2025

NBitBuilder commented Feb 17, 2025

Degraded performance on zero-shot clustering #4

Degraded performance on zero-shot clustering #4

Comments

NBitBuilder commented Feb 15, 2025

ChloeXWang commented Feb 17, 2025

NBitBuilder commented Feb 17, 2025

NBitBuilder commented Feb 17, 2025