You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 9, 2023. It is now read-only.
I discovered this issue while testing PR #655. If you run the Image Embedding README example code, it returns a 3D tensor.
My understanding from the use of embeddings in general, and how they are used in Fifty One is they expect the embeddings to be 1D (for each embedding).
The reason it returns a 3D tensor is because it depends on the backbone used. The default there is resnet101, which returns a 2048x7x7 shape tensor. Others like inception return a flat 1D tensor, i.e. length-X.
To Reproduce
Steps to reproduce the behavior:
Run the README example, but remove the embedding_dim parameter. See below for example.
Note: as-is, this will error on print(embeddings.shape), regardless of configuration, since that is a list. But the question here is around the logic for the ImageEmbedder.
Code sample
fromflash.core.data.utilsimportdownload_datafromflash.imageimportImageEmbedder# 1. Download the datadownload_data("https://pl-flash-data.s3.amazonaws.com/hymenoptera_data.zip", "data/")
# 2. Create an ImageEmbedder with resnet50 trained on imagenet.embedder=ImageEmbedder(backbone="resnet50")
# 3. Generate an embedding from an image path.embeddings=embedder.predict("data/hymenoptera_data/predict/153783656_85f9c3ac70.jpg")
# 4. Print embeddings shapeprint(embeddings.shape)
Expected behavior
Expect to see a 100352x1 shape tensor as the output, instead of 2048x7x7.
Environment
PyTorch Version (e.g., 1.0): 1.9
OS (e.g., Linux): Linux
How you installed PyTorch (conda, pip, source): pip
Build command you used (if compiling from source): N/A
If embedding_dim is None, then the head is nn.Identity(). If we desire a flat 1D embedding, then the question is: should nn.Identity() change to nn.Flatten()?
It could be argued that the user should be left to flatten after on their own, but per the contributing guidelines, I thought this would align with "Force User Decisions To Best Practices"
Let me know your thoughts. If that makes sense, then I can update the code, run some tests, and update docs in a PR.
The text was updated successfully, but these errors were encountered:
Ok, to start, I am no longer able to replicate the error I have above. It prints out:
torch.Size([2048])
So no issues there.
Note: there is a typo where:
print(embeddings.shape)
should be
print(embeddings[0].shape)
in the examples and docs. That can be a small separate PR
I believe @ethanwharris's comment is valid, where you want to apply that pool only if the dimension is too high AND you don't have embedding dim assigned.
🐛 Bug
I discovered this issue while testing PR #655. If you run the Image Embedding README example code, it returns a 3D tensor.
My understanding from the use of embeddings in general, and how they are used in Fifty One is they expect the embeddings to be 1D (for each embedding).
The reason it returns a 3D tensor is because it depends on the backbone used. The default there is
resnet101
, which returns a2048x7x7
shape tensor. Others like inception return a flat 1D tensor, i.e. length-X.To Reproduce
Steps to reproduce the behavior:
Run the README example, but remove the
embedding_dim
parameter. See below for example.Note: as-is, this will error on
print(embeddings.shape)
, regardless of configuration, since that is a list. But the question here is around the logic for the ImageEmbedder.Code sample
Expected behavior
Expect to see a 100352x1 shape tensor as the output, instead of 2048x7x7.
Environment
conda
,pip
, source): pipAdditional context
I believe the question is around what the logic should be here:
https://github.com/PyTorchLightning/lightning-flash/blob/075de3a46d74d9fc0e769401063fede1f12d0518/flash/image/embedding/model.py#L85-L92
If
embedding_dim
is None, then the head isnn.Identity()
. If we desire a flat 1D embedding, then the question is: shouldnn.Identity()
change tonn.Flatten()
?It could be argued that the user should be left to flatten after on their own, but per the contributing guidelines, I thought this would align with "Force User Decisions To Best Practices"
Let me know your thoughts. If that makes sense, then I can update the code, run some tests, and update docs in a PR.
The text was updated successfully, but these errors were encountered: