Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce GQA test combinations #22918

Merged
merged 1 commit into from
Nov 21, 2024
Merged

Conversation

tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Nov 21, 2024

Description

  • Reduce GQA test combinations to save about 35 minutes test time in CI pipelines.
  • Show latency of transformers tests
  • Use seed in DMMHA test to avoid random failure.
  • For test_flash_attn_rocm.py, test skipping condition from "has cuda ep" to "not has rocm ep", so that it does not run in cpu build.
  • For test_flash_attn_cuda.py, move flash attention and memory efficient attention tests to different classes, so that we can skip a test suite instead of checking in each test.

Motivation and Context

It takes too long to run GQA tests in CI pipelines since there are too many combinations.

Linux GPU CUDA CI Pipeline

Before: 5097 passed, 68 skipped, 8 warnings in 1954.64s (0:32:34)
After: 150 passed, 176 skipped, 8 warnings in 530.38s (0:08:50)
Time Saved: 1424 seconds (0:23:44)

Linux CPU CI Pipeline

Before: 5093 passed, 72 skipped, 4 warnings in 467.04s (0:07:47)

  • 212.96s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past
  • 154.12s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past
  • 26.45s transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch

After: 116 passed, 210 skipped, 4 warnings in 93.41s (0:01:33)

  • 0.97s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past
  • 19.23s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past
  • 2.41s transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch

Time Saved: 374 seconds (0:06:14).

Windows GPU CUDA CI Pipeline

Before: 1781 passed, 72 skipped, 6 warnings in 605.48s (0:10:05)
After: 116 passed, 118 skipped, 6 warnings in 275.48s (0:04:35)
Time Saved: 330 seconds (0:05:30)

@tianleiwu tianleiwu merged commit 8d99b1a into main Nov 21, 2024
93 checks passed
@tianleiwu tianleiwu deleted the tlwu/reduce_mha_gqa_test_combinations branch November 21, 2024 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants