Add more optimal dk order optional arg#2512
Conversation
stack-info: PR: #2512, branch: drisspg/stack/37
6e3bafe to
191258c
Compare
stack-info: PR: Dao-AILab#2512, branch: drisspg/stack/37
stack-info: PR: #2512, branch: drisspg/stack/37
191258c to
2c40834
Compare
stack-info: PR: #2512, branch: drisspg/stack/37
2c40834 to
bf9ab5e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf9ab5eae8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
stack-info: PR: #2512, branch: drisspg/stack/37
bf9ab5e to
9ac0e9b
Compare
stack-info: PR: #2512, branch: drisspg/stack/37
9ac0e9b to
e6a0b29
Compare
stack-info: PR: #2512, branch: drisspg/stack/37
e6a0b29 to
a1897d6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a1897d6400
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
stack-info: PR: #2512, branch: drisspg/stack/37
stack-info: PR: Dao-AILab#2512, branch: drisspg/stack/37
| values = (*values, None, None, None, None) | ||
| values = (*values, None, None, None, None, None) | ||
| elif len(values) == 4: | ||
| values = (*values, None, None) |
There was a problem hiding this comment.
I think this is incorrect, for example in the case where full_block_cnt, full_block_idx, dq_write_order_full, dq_kv_order are None: we'd get
(mask_block_cnt, mask_block_idx, dq_write_order, None, None, None, None)instead of
(mask_block_cnt, mask_block_idx, None, None, dq_write_order, None, None)|
I had the same idea here. Instead of walking partial blocks first and then full blocks, we can merge them into one list and launch/visit them in a specified order. This feels closer to the dense-attention pattern. |
stack-info: PR: #2512, branch: drisspg/stack/37
Stacked PRs:
Add more optimal dk order optional arg.
What does this do
Builds on the last PR. In the last PR we only allow for 1 iteration order through column space, LtoR or RtoL (spt or not as where spt is defined for causal)
This updates / changes the way we expressed iteration order along kv columns in the backwards, to allow for an arbitrary permutation. I like this for a few reasons. Its optional and for common things asc or dsc is good. You can basically craft the optimal schedule for whatever blocksparse impl you have.
I had claude do a rough greedy scheduling herusitic for presumed semaphor wait time and we can see it does help. Although sliding window is still getting instanly slow determ times vs non determ time still.
Sliding window perf issue
For the sliding window slowdown, this case is annoying but has a funnish solution that recovers a lot of the perf.