[FP6-LLM] Port splitK map from DeepSpeed #283

gau-nernst · 2024-05-26T22:19:47Z

https://github.com/microsoft/DeepSpeed/blob/3a3a6db3332e339cc9fd94efd4982f6d60635a3d/deepspeed/inference/v2/kernels/core_ops/cuda_linear/cuda_linear.py

Optimal splitK for a given (batch_size, out_dim) pair on A100-80G.

pytorch-bot · 2024-05-26T22:19:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/283

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b89ec10 with merge base 42c2376 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

msaroufim · 2024-05-27T20:33:04Z

torchao/quantization/fp6_llm.py

@@ -111,6 +112,143 @@ def from_tc_float6_e3m2(tensor: Tensor, M: int, N: int, dtype: torch.dtype = tor
    return from_float6_e3m2(tensor_fp6, no_bit_packing=True, dtype=dtype)


+# https://github.com/microsoft/DeepSpeed/blob/3a3a6db3332e339cc9fd94efd4982f6d60635a3d/deepspeed/inference/v2/kernels/core_ops/cuda_linear/cuda_linear.py
+_SPLIT_K_MAP = [
+    {  # tokens: [1, 64]


n00b q: what is meant by token counts here?

Also can these values be autotuned? I don't necesarily wanna merge in something that's only fast on A100

I think it's the batch size. Yea, it will be great if we can autotune CUDA kernel also.

* aoti runner test * change to cpu

port splitK map from DeepSpeed

7aad308

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 26, 2024

msaroufim reviewed May 27, 2024

View reviewed changes

msaroufim self-requested a review May 29, 2024 01:58

Merge branch 'main' into fp6_llm_splitk_map

b89ec10

msaroufim approved these changes May 29, 2024

View reviewed changes

msaroufim merged commit 6dd63b8 into pytorch:main May 29, 2024
13 checks passed

gau-nernst mentioned this pull request May 29, 2024

FP6 dtype! #208

Open

gau-nernst deleted the fp6_llm_splitk_map branch May 29, 2024 03:59

dbyoung18 pushed a commit to dbyoung18/ao that referenced this pull request Jul 31, 2024

[FP6-LLM] Port splitK map from DeepSpeed (pytorch#283)

8918a13

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

aoti runner test (pytorch#283)

a9e680d

* aoti runner test * change to cpu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FP6-LLM] Port splitK map from DeepSpeed #283

[FP6-LLM] Port splitK map from DeepSpeed #283

gau-nernst commented May 26, 2024

pytorch-bot bot commented May 26, 2024 •

edited

Loading

msaroufim May 27, 2024

gau-nernst May 27, 2024

[FP6-LLM] Port splitK map from DeepSpeed #283

[FP6-LLM] Port splitK map from DeepSpeed #283

Conversation

gau-nernst commented May 26, 2024

pytorch-bot bot commented May 26, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/283

✅ No Failures

msaroufim May 27, 2024

Choose a reason for hiding this comment

gau-nernst May 27, 2024

Choose a reason for hiding this comment

pytorch-bot bot commented May 26, 2024 •

edited

Loading