Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating patches and extracting features for [4096 x 4096] #49

Open
nam1410 opened this issue Jun 15, 2023 · 3 comments
Open

Creating patches and extracting features for [4096 x 4096] #49

nam1410 opened this issue Jun 15, 2023 · 3 comments

Comments

@nam1410
Copy link

nam1410 commented Jun 15, 2023

@Richarizardd @faisalml - I appreciate your intuitive work. I have been using CLAM for quite some time, but I have encountered an obstacle as follows:

[Preface] - I use an in-house dataset, and CLAM works fine. I recently read your paper and was curious to generate the hierarchical attention maps for the custom dataset. I have the splits and features for [256 x 256] patches, but how do I connect the existing [256 x 256] to the newly extracted [4096 x 4096] features? I have read the open and closed issues. However, I am not finding a lucid explanation.

Consider a WSI with ~20000 [256 x 256] patches, and I have Resnet50 features already extracted and stored on my disk using CLAM's scripts. @Richarizardd has mentioned that I have to change [256 x 256] to [4096 x 4096] while creating patches and extracting the features. In doing this, is the hierarchy still preserved? For example, if I extract a [4096 x 4096] patch hp1, how do I correlate it with the existing [256 x 256] patches in my directory? Is it using the [x,y] coordinates? Is the trajectory of my understanding of the pre-processing reasonable? Am I missing something?

In addition to this, where do I find ViT-16 features pretrained on TCGA (ref)? Is it from

"from vision_transformer import vit_small\n",

Do I use this instead of resnet_custom in the feature extraction

Or is it from

features_cls256 = []

Please correct me if I am wrong @Richarizardd @faisalml. Thank you.

@Anivader
Copy link

Hi @nam1410,

If you just want to get the 4k-features, you can follow this notebook - https://github.com/mahmoodlab/HIPT/blob/master/HIPT_4K/HIPT_4K%20Inference%20%2B%20Attention%20Visualization.ipynb. Basically, you will need the 4096 x 4096 image regions as input and extract the corresponding 192-dim. embedding from ViT_4k-256.

This is my understanding on HIPT 4k-feature extraction process. @Richarizardd, please correct me if I am wrong -

For the 4k model, start with a 3 x 4096 x 4096 (RGB) region. What you want to do is convert this into a sequence of 256 x 256 patches by reshaping as 3 x 16 (w_256) x 16 (h_256) x 256 x 256. This can be written as B x 3 x 256 x 256, where B = (1 x 16 (w_256) x 16 (h_256)). So the "B" here should be viewed as no. of patches.

Now each of these B = 256 patches is passed into the ViT_16-256 which yields an embedding of dimension 384. So, for the entire 4096 x 4096 region, you will end up with an embedding tensor of [256, 384].

This can now be written as 1 x 384 x 16 (w_256) x 16 (h_256), which is the input to ViT_256-4096. The output then is an embedding tensor: [1 x 192].

@nam1410
Copy link
Author

nam1410 commented Jul 12, 2023

" Basically, you will need the 4096 x 4096 image regions as input...."
Thank you for your response, @Anivader. My question is more focused on how to get those [4096 x 4096] features.
Can you have a look at the question again?

@clemsgrs
Copy link

they use CLAM preprocessing pipeline to extract (4096, 4096) regions
you can take a look at https://github.com/clemsgrs/hs2p where I re-implemented & twisted CLAM preprocessing code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants