Skip to content

Commit

Permalink
Add distributed tests to run-readme-pr.yml (#1466)
Browse files Browse the repository at this point in the history
* Add distributed tests to run-readme-pr.yml

Need to ensure this is the right runner, @lessw2020 can you please have a look -- torchchat uses the same runners as pytorch.

* Update run-docs

Remove HF login because tokens not available as git secret

* Update run-docs

Replace llama3.1 with open-llama to avoid need for token.
If this turns out running too long, then we can switch to stories110M

* Update run-docs

open-llama -> stories.
  • Loading branch information
mikekgfb authored Jan 27, 2025
1 parent 9686c79 commit 5684175
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 1 deletion.
3 changes: 2 additions & 1 deletion .ci/scripts/run-docs
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,8 @@ fi
if [ "$1" == "distributed" ]; then

echo "::group::Create script to run distributed"
python3 torchchat/utils/scripts/updown.py --file docs/distributed.md > ./run-distributed.sh
python3 torchchat/utils/scripts/updown.py --file docs/distributed.md --replace 'llama3.1:stories110M,-l 3:-l 2' --suppress huggingface-cli,HF_TOKEN > ./run-distributed.sh
python3 torchchat/utils/scripts/updown.py --file docs/distributed.md --suppress huggingface-cli,HF_TOKEN > ./run-distributed.sh
# for good measure, if something happened to updown processor,
# and it did not error out, fail with an exit 1
echo "exit 1" >> ./run-distributed.sh
Expand Down
22 changes: 22 additions & 0 deletions .github/workflows/run-readme-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -306,3 +306,25 @@ jobs:
echo "::endgroup::"
TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs native
test-distributed-cuda:
permissions:
id-token: write
contents: read
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
with:
runner: linux.g5.4xlarge.nvidia.gpu
gpu-arch-type: cuda
gpu-arch-version: "12.4"
timeout: 60
script: |
echo "::group::Print machine info"
uname -a
echo "::endgroup::"
.ci/scripts/run-docs distributed
echo "::group::Completion"
echo "tests complete"
echo "*******************************************"
echo "::endgroup::"

0 comments on commit 5684175

Please sign in to comment.