Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly doesn't work in Google Colab #7163

Open
larryliu0820 opened this issue Dec 3, 2024 · 10 comments
Open

Nightly doesn't work in Google Colab #7163

larryliu0820 opened this issue Dec 3, 2024 · 10 comments
Labels
actionable Items in the backlog waiting for an appropriate impl/fix bug Something isn't working module: build Related to buck2 and cmake build triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@larryliu0820
Copy link
Contributor

🐛 Describe the bug

Importing custom ops gives me undefined symbols. Code snippet:

!pip install --extra-index-url https://download.pytorch.org/whl/nightly/ executorch==0.5.0.dev20241203
from executorch.extension.llm.custom_ops import sdpa_with_kv_cache

Error message:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/executorch/extension/llm/custom_ops/sdpa_with_kv_cache.py](https://localhost:8080/#) in <module>
     21 try:
---> 22     op = torch.ops.llama.sdpa_with_kv_cache.default
     23     assert op is not None

4 frames
AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
[/usr/lib/python3.10/ctypes/__init__.py](https://localhost:8080/#) in __init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    372 
    373         if handle is None:
--> 374             self._handle = _dlopen(self._name, mode)
    375         else:
    376             self._handle = handle

OSError: /usr/local/lib/python3.10/dist-packages/executorch/extension/llm/custom_ops/../../pybindings/_portable_lib.cpython-310-x86_64-linux-gnu.so: undefined symbol: xnn_f16_f32acc_gemm_minmax_ukernel_4x16__avx2_broadcast

Versions

Nightly 0.5.0.dev20241203, on google colab

@larryliu0820
Copy link
Contributor Author

@dbort dbort added bug Something isn't working actionable Items in the backlog waiting for an appropriate impl/fix module: build Related to buck2 and cmake build triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Dec 3, 2024
@larryliu0820
Copy link
Contributor Author

Updated torch version to nightly 2.6.0.dev20241203 as well. See a different undefined symbol issue now:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
[<ipython-input-1-2f4a8ffbaaa5>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from executorch.extension.pybindings import portable_lib

[/usr/local/lib/python3.10/dist-packages/executorch/extension/pybindings/portable_lib.py](https://localhost:8080/#) in <module>
     34 #
     35 # Note that all of these are experimental, and subject to change without notice.
---> 36 from executorch.extension.pybindings._portable_lib import (  # noqa: F401
     37     # Disable "imported but unused" (F401) checks.
     38     _create_profile_block,  # noqa: F401

ImportError: /usr/local/lib/python3.10/dist-packages/executorch/extension/pybindings/_portable_lib.cpython-310-x86_64-linux-gnu.so: undefined symbol: sgemm_

@larryliu0820
Copy link
Contributor Author

Seems to be related to ET_BUILD_WITH_BLAS https://github.com/pytorch/executorch/blob/main/kernels/optimized/blas/CPUBlas.cpp#L13

Trying to dig how do we decide whether to turn it on/off

@larryliu0820
Copy link
Contributor Author

@kimishpatel do you know if we are doing things right here? It seems Colab machine doesn't work with CPUBlas?

@mergennachin
Copy link
Contributor

mergennachin commented Dec 3, 2024

Also cc @tarun292 since you reviewed changes in this area

@mergennachin mergennachin moved this from Ready to In progress in ExecuTorch DevX improvements Dec 17, 2024
@mergennachin mergennachin moved this from In progress to Ready in ExecuTorch DevX improvements Dec 17, 2024
@kimishpatel
Copy link
Contributor

Ok will follow up

@kimishpatel
Copy link
Contributor

I think we need to do -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON here https://github.com/pytorch/executorch/blob/main/setup.py#L572

@kimishpatel
Copy link
Contributor

I am unable to repro this if i build the wheel locally. I can repro though in the collab, however it is harder to figure out fix

@kimishpatel
Copy link
Contributor

kimishpatel commented Dec 19, 2024

I have 2 observations.

  1. this pr https://github.com/pytorch/executorch/pull/7212/files landed on 6th and the last nightly was built on 6th too. So possibly change are not picked up and somehow the two are linked. So if we have a nightly after that, we should check on that nightly.
  2. if you do !pip install --extra-index-url https://download.pytorch.org/whl/nightly/ torch==2.6.0.dev20241203+cpu, note the cpu variant and not the general pytorch, then it seems to work. So something is strangely off and likely related to torch's load_library? (https://colab.research.google.com/drive/1lyMbXMKsNCMqiVERvXHlXNOIqseyoGnG?usp=sharing)

@mergennachin
Copy link
Contributor

@kimishpatel

I think we need to do -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON here

So, are you suggesting a fix?

So if we have a nightly after that, we should check on that nightly.

Interesting, all the nightlys after 12/6 has the same failures.

https://hud.pytorch.org/hud/pytorch/executorch/nightly/1?per_page=50&mergeLF=true (click on the arrow next to "Other" to expand)

See this example: https://ossci-raw-job-status.s3.amazonaws.com/log/pytorch/executorch/34071123005

and find string "ImportError: cannot import name 'sdpa_with_kv_cache' from 'executorch.extension.llm.custom_ops' (/__w/_temp/conda_environment_12212546649/lib/python3.10/site-packages/executorch/extension/llm/custom_ops/init.py"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
actionable Items in the backlog waiting for an appropriate impl/fix bug Something isn't working module: build Related to buck2 and cmake build triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: Ready
Development

No branches or pull requests

4 participants