Skip to content

[ROCm] ROCm triton pin update#130625

Closed
jataylo wants to merge 5 commits intomainfrom
new-triton-rocm-pin-1207
Closed

[ROCm] ROCm triton pin update#130625
jataylo wants to merge 5 commits intomainfrom
new-triton-rocm-pin-1207

Conversation

@jataylo
Copy link
Copy Markdown
Collaborator

@jataylo jataylo commented Jul 12, 2024

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Jul 12, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130625

Note: Links to docs will display an error until the docs builds have been completed.

❌ 20 New Failures, 1 Cancelled Job, 2 Unrelated Failures

As of commit dae6a2f with merge base df0494b (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@jataylo jataylo added the keep-going Don't stop on first failure, keep running tests until the end label Jul 12, 2024
@jataylo
Copy link
Copy Markdown
Collaborator Author

jataylo commented Jul 14, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Successfully rebased new-triton-rocm-pin-1207 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout new-triton-rocm-pin-1207 && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the new-triton-rocm-pin-1207 branch from dabfa41 to ea10b40 Compare July 14, 2024 11:29
@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 15, 2024
@jataylo
Copy link
Copy Markdown
Collaborator Author

jataylo commented Jul 15, 2024

Just testing moving upstream pin also so we can find ROCm exclusive failures

@jataylo
Copy link
Copy Markdown
Collaborator Author

jataylo commented Jul 18, 2024

I'm seeing many sympy related errors, hopefully these are fixed with a rebase. But we do see 1 or 2 real failures from the commit bump exclusively on ROCm -

TestSelectAlgorithm.test_addmm_fp16 - AssertionError: Incorrect result from choice TritonTemplateCaller(/tmp/tmp0qhe2wis/nl/cnlgjadb7fxqmphrkrgahould4qfakwpppbcamn3y7bd7v6rpu3o.py, ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, B_PROLOGUE_CAST_TYPE=None, EVEN_K=True, GROUP_M=8, matrix_instr_nonkdim=0, num_stages=0, num_warps=4)

Leads to Tensor-likes issue

TestCompiledAutograd.test_free_activation_memory - AssertionError: False is not true

But this was previously disabled so may be flakey/resolved issue

@jataylo
Copy link
Copy Markdown
Collaborator Author

jataylo commented Jul 31, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Successfully rebased new-triton-rocm-pin-1207 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout new-triton-rocm-pin-1207 && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the new-triton-rocm-pin-1207 branch from 88e9d92 to dae6a2f Compare July 31, 2024 12:18
@jataylo
Copy link
Copy Markdown
Collaborator Author

jataylo commented Aug 13, 2024

Not planned.

@jataylo jataylo closed this Aug 13, 2024
@github-actions github-actions Bot deleted the new-triton-rocm-pin-1207 branch September 17, 2024 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants