Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump PT 2025131 and ET pins 20250209 #1493

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Bump PT 2025131 and ET pins 20250209 #1493

wants to merge 5 commits into from

Conversation

Copy link

pytorch-bot bot commented Feb 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1493

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 8625843 with merge base 53a1004 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 11, 2025
@Jack-Khuu
Copy link
Contributor Author

Jack-Khuu commented Feb 11, 2025

Failing on https://github.com/pytorch/torchchat/actions/runs/13253427250/job/37044260670?pr=1493

echo "Export and run AOTI (C++ runner)"
python torchchat.py export stories110M --output-aoti-package-path ./model.pt2 --dtype float32 --quantize '{"embedding:wx": {"bitwidth": 2, "groupsize": 32}, "linear:a8wxdq": {"bitwidth": 3, "groupsize": 128, "has_weight_zeros": false}}'
./cmake-out/aoti_run ./model.pt2 -z ./tokenizer.model -t 0 -i "${PRMT}"

Looks like we might be due to an AO mismatch error
Probably need both this and #1458

Edit: Unrelated

cc: @metascroy

@Jack-Khuu
Copy link
Contributor Author

Jack-Khuu commented Feb 11, 2025

Tested locally isolated from AO changes, suggests that #1458, is unrelated

(Just bumping pt causes failure with runner)

@Jack-Khuu
Copy link
Contributor Author

Jack-Khuu commented Feb 12, 2025

Error when using AOTI runner with linked torchao lib. Rolls up to the change in how pt/pt detects with OpenMP pytorch/pytorch#145870 (cc: @malfet)

Without Brew install: https://github.com/pytorch/torchchat/actions/runs/13273334566/job/37057693025?pr=1493

dyld[8590]: Library not loaded: /opt/homebrew/opt/libomp/lib/libomp.dylib
  Referenced from: <E04D3A6F-A452-31EF-9520-27C6B4140221> /Users/runner/work/torchchat/torchchat/torchao-build/cmake-out/lib/libtorchao_ops_aten.dylib
  Reason: tried: '/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file)
/Users/runner/work/_temp/2d8a0077-1581-4972-9650-72fbb0b54b33.sh: line 6:  8590 Abort trap: 6           ./cmake-out/aoti_run ./model.pt2 -z ./tokenizer.model -t 0 -i "${PRMT}"

With Brew install: https://github.com/pytorch/torchchat/actions/runs/13275987426/job/37065581082?pr=1493

OMP: Error #15: Initializing libomp.dylib, but found libomp.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/
/Users/runner/work/_temp/53841c20-106a-43[87](https://github.com/pytorch/torchchat/actions/runs/13275987426/job/37065581082?pr=1493#step:9:88)-9bec-5985e8418fa9.sh: line 6:  8899 Abort trap: 6           ./cmake-out/aoti_run ./model.pt2 -z ./tokenizer.model -t 0 -i "${PRMT}"

@swolchok I saw you had fun with this last week: pytorch/executorch#8098

Thoughts on how to unblock?

@malfet
Copy link
Contributor

malfet commented Feb 12, 2025

Error when using AOTI runner with linked torchao lib. Rolls up to the change in how pt/pt detects with OpenMP pytorch/pytorch#145870 (cc: @malfet)

Without Brew install: https://github.com/pytorch/torchchat/actions/runs/13273334566/job/37057693025?pr=1493

dyld[8590]: Library not loaded: /opt/homebrew/opt/libomp/lib/libomp.dylib
  Referenced from: <E04D3A6F-A452-31EF-9520-27C6B4140221> /Users/runner/work/torchchat/torchchat/torchao-build/cmake-out/lib/libtorchao_ops_aten.dylib
  Reason: tried: '/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file), '/opt/homebrew/opt/libomp/lib/libomp.dylib' (no such file)
/Users/runner/work/_temp/2d8a0077-1581-4972-9650-72fbb0b54b33.sh: line 6:  8590 Abort trap: 6           ./cmake-out/aoti_run ./model.pt2 -z ./tokenizer.model -t 0 -i "${PRMT}"

[Edit] How torchao is build? I.e. why does it link itself with libOMP, it should just borrow the dependency from Torch (where it's bundled as part of nightlies, I just check that's the case)

@Jack-Khuu
Copy link
Contributor Author

Pointer to the cmake build into torchao:

install_torchao_aten_ops() {
local device=${1:-cpu}
if [[ "$device" == "cpu" ]]; then
echo "Building torchao custom ops for ATen"
pushd ${TORCHCHAT_ROOT}/torchao-build/src/ao/torchao/experimental
elif [[ "$device" == "mps" ]]; then
echo "Building torchao mps custom ops for ATen"
pushd ${TORCHCHAT_ROOT}/torchao-build/src/ao/torchao/experimental/ops/mps
else
echo "Invalid argument: $device. Valid values are 'cpu' or 'mps'." >&2
return 1
fi
CMAKE_OUT_DIR=${TORCHCHAT_ROOT}/torchao-build/cmake-out
cmake -DCMAKE_PREFIX_PATH=${MY_CMAKE_PREFIX_PATH} \
-DCMAKE_INSTALL_PREFIX=${CMAKE_OUT_DIR} \
-DCMAKE_BUILD_TYPE="Release" \
-S . \
-B ${CMAKE_OUT_DIR} -G Ninja
cmake --build ${CMAKE_OUT_DIR} --target install --config Release
popd
}

[Edit] How torchao is build? I.e. why does it link itself with libOMP, it should just borrow the dependency from Torch (where it's bundled as part of nightlies, I just check that's the case)

I'm not familiar with the linking
@metascroy Can you help answer? Jerry is out, but cc: @jcaip to loop in AO

@jcaip
Copy link

jcaip commented Feb 12, 2025

cc @malfet I see that there's this line in Utils.cmake which is responsible for the custom linking.

https://github.com/pytorch/ao/blame/d3306b22b0e9cba09762c335757c1dcfbd96f170/torchao/experimental/Utils.cmake#L24

maybe noob question - should that be target_link_libraries(${target_name} PRIVATE "${TORCH_LIBRARIES}") like the line 21 above to borrow the dependency from Torch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants