You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you so much for creating the model and providing the weights for the community. Your contribution to this field is truly commendable!
I was able to reproduce the tutorial results, which suggests that I deployed the model correctly.
However, when I tested the embeddings using the HER2+ breast cancer dataset (a widely recognized benchmark with region annotations, as outlined here), the results were disappointing.
The embeddings performed poorly, yielding very low SI and Adjusted Rand Index (ARI) scores across all slides. As you can see in this table:
In comparison, I tested other baseline models, such as spatial transcriptomics-based methods (e.g., spaGCN) and H&E-based models (e.g., UNI-h, Virchow2), all in a zero-shot manner. The results were significantly better; for instance, Virchow2 demonstrated:
Even for models (all zero-shot), such as spaGCN, we have:
sample:H1, metric ARI:
Kmeans (0.264)
Louvain (0.218)
SpaGCN (0.319)
BayesSpace (0.374)
sample:G2, metric ARI:
KMeans on gene expression(0.171)
Louvain on gene expression (0.161)
SpaGCN without image (0.126)
BayesSpace without image (0.173)
Given that other users also expressed concerns about the model's performance (#3 (comment)), perhaps the unexpected results require a more comprehensive evaluation by the authors.
For your reference to reproduce my results, I attached the embedding of scGPT spatial and Virchow2 here:
Please see related replies in #3 comment. To summarize, we will release a finetuning tutorial for Visium brain slides in the next few weeks. The finetuning workflow is unsupervised and hence comparable to specialized methods such as SpaGCN. Please use the output from finetuned models instead and it should yield reasonable results.
Thank you for your explanation. I will definitely try your new weights once they are released.
In the meantime, could you please clarify the differences between the current pretrained model and the newly released fine-tuned model? How do they differ in terms of training, and under what conditions should I use the pretrained weights versus the fine-tuned weights?
You mentioned that "we will release a finetuning tutorial for Visium brain slides in the next few weeks." Will the breast HER2 dataset also require a separate fine-tuned model for Visium breast slides? If I would be wrong, I don't believe this qualifies as a foundation model since it creates individual fine-tuning models for each dataset.
Thank you so much for creating the model and providing the weights for the community. Your contribution to this field is truly commendable!
I was able to reproduce the tutorial results, which suggests that I deployed the model correctly.
However, when I tested the embeddings using the HER2+ breast cancer dataset (a widely recognized benchmark with region annotations, as outlined here), the results were disappointing.
The embeddings performed poorly, yielding very low SI and Adjusted Rand Index (ARI) scores across all slides. As you can see in this table:
In comparison, I tested other baseline models, such as spatial transcriptomics-based methods (e.g., spaGCN) and H&E-based models (e.g., UNI-h, Virchow2), all in a zero-shot manner. The results were significantly better; for instance, Virchow2 demonstrated:
Even for models (all zero-shot), such as spaGCN, we have:
Given that other users also expressed concerns about the model's performance (#3 (comment)), perhaps the unexpected results require a more comprehensive evaluation by the authors.
For your reference to reproduce my results, I attached the embedding of scGPT spatial and Virchow2 here:
vrichow2.zip
scgpt.zip
The text was updated successfully, but these errors were encountered: