Skip to content

Pin torch == 2.6 on PR CI docker images for now#37695

Merged
ydshieh merged 1 commit intomainfrom
check_circleci_networkx
Apr 23, 2025
Merged

Pin torch == 2.6 on PR CI docker images for now#37695
ydshieh merged 1 commit intomainfrom
check_circleci_networkx

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Apr 23, 2025

What does this PR do?

torch 2.7 is out today, but it causes strange issues , see below

Let's pin 2.6 for now on CircleCI

error log

  File "/usr/local/lib/python3.9/site-packages/transformers/modeling_utils.py", line 62, in <module>
    from .integrations.flex_attention import flex_attention_forward
  File "/usr/local/lib/python3.9/site-packages/transformers/integrations/flex_attention.py", line 39, in <module>
    from torch.nn.attention.flex_attention import BlockMask, flex_attention
  File "/usr/local/lib/python3.9/site-packages/torch/nn/attention/flex_attention.py", line 15, in <module>
    from torch._dynamo._trace_wrapped_higher_order_op import TransformGetItemToIndex
  File "/usr/local/lib/python3.9/site-packages/torch/_dynamo/__init__.py", line 53, in <module>
    from .polyfills import loader as _  # usort: skip # noqa: F401
  File "/usr/local/lib/python3.9/site-packages/torch/_dynamo/polyfills/loader.py", line 25, in <module>
    POLYFILLED_MODULES: tuple["ModuleType", ...] = tuple(
  File "/usr/local/lib/python3.9/site-packages/torch/_dynamo/polyfills/loader.py", line 26, in <genexpr>
    importlib.import_module(f".{submodule}", package=polyfills.__name__)
  File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.9/site-packages/torch/_dynamo/polyfills/builtins.py", line 31, in <module>
    def all(iterable: Iterable[object], /) -> bool:
  File "/usr/local/lib/python3.9/site-packages/torch/_dynamo/decorators.py", line 427, in wrapper
    rule_map: dict[Any, type[VariableTracker]] = get_torch_obj_rule_map()
  File "/usr/local/lib/python3.9/site-packages/torch/_dynamo/trace_rules.py", line 2870, in get_torch_obj_rule_map
    obj = load_object(k)
  File "/usr/local/lib/python3.9/site-packages/torch/_dynamo/trace_rules.py", line 2901, in load_object
    val = _load_obj_from_str(x[0])
  File "/usr/local/lib/python3.9/site-packages/torch/_dynamo/trace_rules.py", line 2885, in _load_obj_from_str
    return getattr(importlib.import_module(module), obj_name)
  File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.9/site-packages/torch/_higher_order_ops/map.py", line 6, in <module>
    from torch._functorch.aot_autograd import AOTConfig, create_joint
  File "/usr/local/lib/python3.9/site-packages/torch/_functorch/aot_autograd.py", line 135, in <module>
    from .partitioners import default_partition
  File "/usr/local/lib/python3.9/site-packages/torch/_functorch/partitioners.py", line 37, in <module>
    from ._activation_checkpointing.graph_info_provider import GraphInfoProvider
  File "/usr/local/lib/python3.9/site-packages/torch/_functorch/_activation_checkpointing/graph_info_provider.py", line 3, in <module>
    import networkx as nx
  File "/usr/local/lib/python3.9/site-packages/networkx/__init__.py", line 19, in <module>
    from networkx import utils
  File "/usr/local/lib/python3.9/site-packages/networkx/utils/__init__.py", line 7, in <module>
    from networkx.utils.backends import *
  File "/usr/local/lib/python3.9/site-packages/networkx/utils/backends.py", line 258, in <module>
    backends = _get_backends("networkx.backends")
  File "/usr/local/lib/python3.9/site-packages/networkx/utils/backends.py", line 234, in _get_backends
    items = entry_points(group=group)
TypeError: entry_points() got an unexpected keyword argument 'group'

@github-actions github-actions bot marked this pull request as draft April 23, 2025 08:15
@github-actions
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@ydshieh ydshieh marked this pull request as ready for review April 23, 2025 08:17
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@ydshieh ydshieh force-pushed the check_circleci_networkx branch from f006ff8 to 9792ae4 Compare April 23, 2025 09:09
@ydshieh ydshieh changed the title trigger CI Pin torch == 2.6 on PR CI docker images for now Apr 23, 2025
Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Should unblock our workflows!

@ydshieh ydshieh merged commit ca79030 into main Apr 23, 2025
11 checks passed
@ydshieh ydshieh deleted the check_circleci_networkx branch April 23, 2025 09:47
@vasqu
Copy link
Contributor

vasqu commented Apr 25, 2025

@ydshieh @Cyrilvallez I backtracked from where the issue comes from:

The main culprit was introduced in pytorch/pytorch#143539 by using import networkx as nx as seen in the logs. The problem is that

It just hasn't been noticed likely due to only using it on parts where the networkx version is pinned, e.g. https://github.com/pytorch/pytorch/blob/c1c8c1f8d6993e39203d4bebcdf63bef2c867de0/tools/build/bazel/requirements.in#L7 - I would suggest pinning the networkx version (<3.3) and submitting a separate issue in the torch repo. Wdyt?

Edit: Seems more like uv is not picking up the correct versions 😢 #37000 might need to be done on all CI jobs

@vasqu
Copy link
Contributor

vasqu commented Apr 25, 2025

Small update. It seems to be an issue from uv to ignore dep mismatches when using whl files + torch not explicitly indexing this py dependency. This should be solved by:

  • torch #6575 on torch side to explicitly include this dep in the index
  • uv #13086 on uv side to handle dep mismatches on whls

Should not move too fast and wait if things eventually work out :) cc @ydshieh

@ydshieh
Copy link
Collaborator Author

ydshieh commented Apr 29, 2025

Hi @vasqu Thank you a lot of helping investigating 🤗 ❤️ .

would suggest pinning the networkx version (<3.3) and submitting a separate issue in the torch repo. Wdyt?

I planned to do this so we move to torch 2.7 on CircleCI jobs, but @gante found there are some regressions and decided to keep the pin, see #37760.

Should not move too fast and wait if things eventually work out :)

Is this a suggestion to us :-) ?

@ydshieh
Copy link
Collaborator Author

ydshieh commented Apr 29, 2025

@gante said we can try to use torch 2.7 on CircleCI jobs. I will try to update

@vasqu
Copy link
Contributor

vasqu commented Apr 29, 2025

@ydshieh No problem :) I think with the fixes on the torch side, a pin might not be necessary anymore - tested also locally for py 3.9.

Should not move too fast and wait if things eventually work out :)

Was initially meant due to torch updating their index, so it should be good now (with uv and torch/networkx).

@ydshieh
Copy link
Collaborator Author

ydshieh commented Apr 29, 2025

FYI: #37856

There are a few torch compile issues but we are working on it in order to move to torch 2.7

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
pin 2.6 on CircleCi images

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants