Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Degraded performance on zero-shot clustering #4

Open
NBitBuilder opened this issue Feb 15, 2025 · 3 comments
Open

Degraded performance on zero-shot clustering #4

NBitBuilder opened this issue Feb 15, 2025 · 3 comments

Comments

@NBitBuilder
Copy link

Thank you so much for creating the model and providing the weights for the community. Your contribution to this field is truly commendable!

I was able to reproduce the tutorial results, which suggests that I deployed the model correctly.

However, when I tested the embeddings using the HER2+ breast cancer dataset (a widely recognized benchmark with region annotations, as outlined here), the results were disappointing.

The embeddings performed poorly, yielding very low SI and Adjusted Rand Index (ARI) scores across all slides. As you can see in this table:

Image

In comparison, I tested other baseline models, such as spatial transcriptomics-based methods (e.g., spaGCN) and H&E-based models (e.g., UNI-h, Virchow2), all in a zero-shot manner. The results were significantly better; for instance, Virchow2 demonstrated:

Image

Even for models (all zero-shot), such as spaGCN, we have:

sample:H1, metric ARI:
Kmeans (0.264)
Louvain (0.218)
SpaGCN (0.319)
BayesSpace (0.374)

sample:G2, metric ARI:
KMeans on gene expression(0.171)
Louvain on gene expression (0.161)
SpaGCN without image (0.126)
BayesSpace without image (0.173)

Given that other users also expressed concerns about the model's performance (#3 (comment)), perhaps the unexpected results require a more comprehensive evaluation by the authors.

For your reference to reproduce my results, I attached the embedding of scGPT spatial and Virchow2 here:

vrichow2.zip
scgpt.zip

@ChloeXWang
Copy link
Collaborator

Hi @NBitBuilder, thank you for using our methods!

Please see related replies in #3 comment. To summarize, we will release a finetuning tutorial for Visium brain slides in the next few weeks. The finetuning workflow is unsupervised and hence comparable to specialized methods such as SpaGCN. Please use the output from finetuned models instead and it should yield reasonable results.

Hope this helps clarify your questions.

@NBitBuilder
Copy link
Author

Thank you for your explanation. I will definitely try your new weights once they are released.

In the meantime, could you please clarify the differences between the current pretrained model and the newly released fine-tuned model? How do they differ in terms of training, and under what conditions should I use the pretrained weights versus the fine-tuned weights?

@NBitBuilder
Copy link
Author

You mentioned that "we will release a finetuning tutorial for Visium brain slides in the next few weeks." Will the breast HER2 dataset also require a separate fine-tuned model for Visium breast slides? If I would be wrong, I don't believe this qualifies as a foundation model since it creates individual fine-tuning models for each dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants