-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
splitting list of torch.Tensor causes "repeating" behavior of the output #761
Comments
A further look shows that the checksum/hashing process might be the root cause. The checksum/hashes for tensors seems to be the same even if the values of the tensors differ, whereas this is not the case for numpy. |
If that's the case, then what you probably need to do is register a serializer for the type: from pydra.utils.hash import register_serializer, Cache
@register_serializer(torch.tensor)
def bytes_repr_tensor(obj: torch.tensor, cache: Cache) -> Iterator[bytes]:
# Some efficient method for turning the object into a byte sequence to hash See https://github.com/nipype/pydra/blob/master/pydra/utils/hash.py for examples. If you have an approach that will work with all array-likes, we could update |
Hi @effigies I have coded up a solution locally, but am getting |
You'll need to fork the repository, push to a branch on your own fork, and then create a pull request. |
it should be the case that without any registering additional serializers pydra should behave appropriately for hashing an arbitrary object (however inefficiently). this is the second example where this has broken down. that seems like a pydra bug. |
C extension objects are going to be difficult, as they may not be introspectable (or differentiable) in the same way as pure Python objects, using In #762 I have suggested that we identify numpy array API objects with a protocol, which should cover many of these use cases. |
perhaps this issue is more about how we are detecting types of objects then. may be if we are not confident, we can and should fallback to hashing the pickled bytestream (that should generally work). i believe if this code reached the "pickle and has bytestream part", @wilke0818 wouldn't have the erroneous behavior that he noticed. |
Similarly, using other tensor structures doesn't change the behavior:
or
Notably, using numpy does give the expected behavior (likely as a result of #340)
The text was updated successfully, but these errors were encountered: