Skip to content

Comments

[Int4-AWQ] Torch Int-4 AWQ Dequantization and Configuration Options#146

Merged
hegemanjw4amd merged 1 commit intomainfrom
hegeman/basic-sdpa-attention-int4-awq-interim
Aug 21, 2024
Merged

[Int4-AWQ] Torch Int-4 AWQ Dequantization and Configuration Options#146
hegemanjw4amd merged 1 commit intomainfrom
hegeman/basic-sdpa-attention-int4-awq-interim

Conversation

@hegemanjw4amd
Copy link

This PR creates a fully general Int4-AWQ dequantization function which uses torch and adds environment options (flags) for controlling torch-vs-triton codepaths.

Testing: Two HuggingFace models quantized in Int4-AWQ format have been successfully run:
Qwen2-7B-Instruct-AWQ (Latency benchmarking)
Phi-3-mini-4k-instruct-AWQ (Input verification)
For the latter model, specific input prompts were supplied and the output examined, in order to provide a sanity check for correctness.

Unit testing is accomplished via tests/kernels/test_awq_triton.py.

Resolves: https://github.com/ROCm/FasterTransformer-Internal/issues/287

@hegemanjw4amd hegemanjw4amd force-pushed the hegeman/basic-sdpa-attention-int4-awq-interim branch 3 times, most recently from 0b78568 to dd9a148 Compare August 21, 2024 10:47
Copy link

@shajrawi shajrawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ship it

@hegemanjw4amd hegemanjw4amd force-pushed the hegeman/basic-sdpa-attention-int4-awq-interim branch from dd9a148 to d4332ec Compare August 21, 2024 16:15
@hegemanjw4amd hegemanjw4amd merged commit 4e9830e into main Aug 21, 2024
@gshtras gshtras deleted the hegeman/basic-sdpa-attention-int4-awq-interim branch September 10, 2024 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants