-
Notifications
You must be signed in to change notification settings - Fork 41
Switch to upstream Triton compiler, and related changes #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
8d5ba3f
A new Triton compiler sans CUDA support.
xinyazhang 003b06e
Fix the compiler for new Triton
xinyazhang cd9615c
Mitigate compiler bug (https://github.com/ROCm/triton/issues/596)
xinyazhang 77fa1e1
add wheel as another required package.
xinyazhang 579bf82
Port to performant kernel and moving away from block pointers for tl.…
xinyazhang 2a6872a
Fix the off_h_k computation.
xinyazhang c62c904
Fix writing to encoded_softmax
xinyazhang 8be265d
Submit the Triton kernel as we are testing. All UTs passed
xinyazhang e150f54
remove debugging output
xinyazhang 713e87a
cpp tuning: Add basic C++ tuning support
xinyazhang 36ca9fe
v2src/flash/attn_fwd: add missing num_head_q and num_head_k
xinyazhang 63cc8fb
Flash API now returns selected psels and copts to extra arguments, if…
xinyazhang fdd14e1
Implement tune_flash with AOT kernels
xinyazhang 0e988f9
Fix the dropout_mask and add a progressbar to test/tune_flash.py
xinyazhang c4c201f
Save memory for long seq length
xinyazhang 8528c7a
Update the tuning database for MI200 only GPUs
xinyazhang 1d4fbd0
Remove seqlen_q/k >= 32k rows from the database
xinyazhang dd6a26b
Fix CMakeLists. Do not pass empty string as cmd argument if GENERATE_…
xinyazhang 98c404c
Return hipErrorSharedObjectSymbolNotFound for untuned cases.
xinyazhang ec12934
Fix test/test_backward.py
xinyazhang 4513dfe
Fix AUTOTUNE_KEYS for backward kernels.
xinyazhang 350b6bb
tritonsrc: add type annotation 'i32' to num_seqlens, and fix varlen h…
xinyazhang ece99b8
fix the assignment of .num_head_q/k
xinyazhang b3f9dab
Add Navi 31/32 compiler options.
xinyazhang d33cf43
Fix various problems and now most fwd kernel tests passed.
xinyazhang ad33017
Various fixes to tune_flash
xinyazhang 09583e2
Make zstd quite
xinyazhang 87262c1
Add draft document 'How To Generate Tuning Database.md'
xinyazhang f95d878
doc -> docs
xinyazhang a5a3189
Debugging output in bwd kernel
xinyazhang c091ef3
add num_head_q/k argument to varlen's attention module.
xinyazhang 34ac678
tritonsrc/performance_forward: read env var N_CTX to determine testin…
xinyazhang 22c3197
Reduce the tuning time since there are too many cases to test...
xinyazhang 0b40af3
cpp autotune: x2 num_warps if warp_size == 32
xinyazhang 7457a5a
Navi32: skip autotune configs that takes too long to build
xinyazhang b7d647c
Add --use_multigpu to test/tune_flash.py for multi-GPU tuning
xinyazhang e819160
test/tune_flash.py: actually distribute tensor/computing to different…
xinyazhang b8c702d
Move dev-only packages from requirements.txt into requirements-dev.txt
xinyazhang b5869a1
tune_flash.py: Fix the slow splice_pipes
xinyazhang c4f0de5
Fix single GPU script.
xinyazhang 8558d73
Move database accessing to a separate process, and unify the task gen…
xinyazhang 18b56c0
tune_flash: add --json_file, improve --dry_run to report total numbers,
xinyazhang fcfa3e8
tune_flash: Move the testing to a separate process to avoid segfault.
xinyazhang f7b1f28
Cache the minesweeping process to avoid creating processes repeatedly
xinyazhang a150716
Remove 16k from seqlen_q/k, record task id and skipped tests in json
xinyazhang d93ceef
tune_flash: add --continue_from_json_file
xinyazhang 7d224cd
table_tool: skip result=skipped json objects
xinyazhang 6ef9a40
tuning_database: Update FLASH$attn_fwd for gfx90a and gfx942
xinyazhang e92f7d1
track aotriton-hyperjump branch in third_party/triton
xinyazhang 7ed9ac6
Fix test/performance_forward.py
xinyazhang bb1a5e8
Remove old_compile.py
xinyazhang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,10 @@ | ||
| __pycache__/ | ||
| build/ | ||
| *build*/ | ||
| *.swp | ||
| tritonsrc/tune-*.json | ||
| *.csv | ||
| *.png | ||
| 1 | ||
| 2 | ||
| 1.* | ||
| 2.* |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| # TL;DR | ||
|
|
||
| ``` | ||
| mkdir cpptune_build | ||
| cd cpptune_build | ||
| cmake .. -DCMAKE_INSTALL_PREFIX=./install_dir -DCMAKE_BUILD_TYPE=Release -DAOTRITON_BUILD_FOR_TUNING=ON -G Ninja | ||
| # Optionally only build for one arch | ||
| # cmake .. -DCMAKE_INSTALL_PREFIX=./install_dir -DCMAKE_BUILD_TYPE=Release -DAOTRITON_BUILD_FOR_TUNING=ON -DTARGET_GPUS=Navi32 -G Ninja | ||
| ninja install | ||
| cd .. | ||
| PYTHONPATH=cpptune_build/bindings/ python test/tune_flash.py --bias_type 0 --db_file v2python/rules/tuning_database.sqlite3 | ||
| ``` |
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| -r requirements.txt | ||
| tqdm | ||
| textual |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,3 +4,4 @@ packaging | |
| pluggy | ||
| numpy | ||
| setuptools | ||
| wheel | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.