Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump PyTorch version to 2.2 #1364

Merged
merged 1 commit into from
Feb 26, 2024
Merged

Bump PyTorch version to 2.2 #1364

merged 1 commit into from
Feb 26, 2024

Conversation

will-cromar
Copy link
Contributor

@will-cromar will-cromar commented Feb 26, 2024

Also bump ecosystem packages (torchtext, torchvision, torchaudio) to latest versions

@will-cromar
Copy link
Contributor Author

cc @djherbis

@djherbis djherbis merged commit 3434de7 into Kaggle:main Feb 26, 2024
1 check passed
@djherbis
Copy link
Contributor

Thanks!

@will-cromar
Copy link
Contributor Author

I saw that the Jenkins workflow failed but can't see the reason. Feel free to revert this PR if you need to.

I experimented in my own notebook and ran into this error:

024-02-26 22:23:13.206001: I external/xla/xla/pjrt/pjrt_api.cc:146] The PJRT plugin has PJRT API version 0.32. The framework PJRT API version is 0.40.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1708986193.206053     185 pjrt_computation_client.cc:158] Non-OK-status: tpu_status status: INVALID_ARGUMENT: Mismatched PJRT plugin PJRT API version (0.32) and framework PJRT API version 0.40).

It looks like it's getting the wrong libtpu somewhere, but the bundled libtpu does exist: /usr/local/lib/python3.10/site-packages/torch_xla/lib/libtpu.so

@will-cromar
Copy link
Contributor Author

It looks like TPU_LIBRARY_PATH is getting overridden to the libtpu package:

image

In the 2.2 release, we intentionally don't write that value, but we do treat it as an override: https://github.com/pytorch/xla/blob/v2.2.0/torch_xla/__init__.py#L98-L125

@djherbis
Copy link
Contributor

Looks like this is the actual cause of the failure:

#13 90.98 ERROR: Cannot install torch==2.2.0 and torchvision==0.17.1 because these package versions have conflicting dependencies.

#13 90.98 

#13 90.98 The conflict is caused by:

#13 90.98     The user requested torch==2.2.0

#13 90.98     torchvision 0.17.1 depends on torch==2.2.1

@will-cromar Can we use 2.2.1?

@djherbis
Copy link
Contributor

Trying 2.2.1: da1e2ec

@will-cromar
Copy link
Contributor Author

will-cromar commented Feb 27, 2024

Trying 2.2.1: da1e2ec

2.2.1 won't work because torch_xla hasn't published a corresponding patch release. What you can do is keep 2.2.0 in the config, then switch the torch==${TORCH_VERSION} for torch~=${TORCH_VERSION}. This will install the latest patch release for torch under 2.2.

Edit: You can also create a separate TORCH_XLA_VERSION. In general, we don't publish patch releases on the same schedule as upstream torch.

@djherbis
Copy link
Contributor

Thanks, trying that: #1365

@djherbis
Copy link
Contributor

Nice, it builds at least :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants