Add integration with gemlite weight only quant by jerryzh168 · Pull Request #2528 · sgl-project/sglang

jerryzh168 · 2024-12-19T18:35:33Z

Summary:
gemlite Only available with nightly torchao right now (or install from source)

Test Plan:

python3 -m sglang.bench_one_batch --model meta-llama/Llama-3.1-8B-Instruct --batch-size 1 --input 1024 --output 512 --json-model-override-args '{"architectures": ["TorchNativeLlamaForCausalLM"]}' --enable-torch-compile —torchao-config gemlite-4-64 --tp-size 1

Reviewers:

Subscribers:

Tasks:

Tags:

Motivation

Modifications

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

Summary: gemlite Only available with nightly torchao right now (or install from source) Test Plan: ``` python3 -m sglang.bench_one_batch --model meta-llama/Llama-3.1-8B-Instruct --batch-size 1 --input 1024 --output 512 --json-model-override-args '{"architectures": ["TorchNativeLlamaForCausalLM"]}' --enable-torch-compile —torchao-config gemlite-4-64 --tp-size 1 ``` Reviewers: Subscribers: Tasks: Tags:

zhyncs · 2024-12-19T18:42:08Z

Hi @jerryzh168 What is the release cycle of torchao? I can accept using the torchao nightly version, maybe you can try enabling it in the https://github.com/sgl-project/sglang/blob/main/python/pyproject.toml. What do you think? cc @merrymercy @Ying1123 @ispobock

python/sglang/srt/layers/torchao_utils.py

jerryzh168 · 2024-12-19T18:44:25Z

Hi @jerryzh168 What is the release cycle of torchao? I can accept using the torchao nightly version, maybe you can try enabling it in the main/python/pyproject.toml. What do you think? cc @merrymercy @Ying1123 @ispobock

we have ~ monthly releases, yeah depend on nightly version would be better for now, and we can update to a stable version a bit later I think

jerryzh168 · 2024-12-19T18:53:32Z

I tried pip install torchao>=0.8.0.dev20241219 but it doesn't work, we probably need to use

pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cu124

to install the nightly version, do we just want to add a version check here?

jerryzh168 · 2024-12-19T23:02:20Z

@zhyncs I think we can land, it's fine to have this as an experimental feature for now I think, I added a print to ask people to use torchao nightly

jerryzh168 requested review from Ying1123, ispobock, merrymercy and zhyncs as code owners December 19, 2024 18:35

jerryzh168 commented Dec 19, 2024

View reviewed changes

python/sglang/srt/layers/torchao_utils.py Outdated Show resolved Hide resolved

formatting

6aabf04

jerryzh168 added 3 commits December 19, 2024 10:53

format

c448c77

add gemlite dep

21cd459

add error checks

bb9f133

jerryzh168 and others added 2 commits December 19, 2024 15:07

format

88f4ece

Merge branch 'main' into add-gemlite

111e0fe

zhyncs approved these changes Dec 20, 2024

View reviewed changes

zhyncs merged commit feb2b76 into sgl-project:main Dec 20, 2024

zhyncs added a commit that referenced this pull request Dec 21, 2024

fix #2528

78246d9

zhyncs added a commit that referenced this pull request Dec 21, 2024

fix #2528 (#2541)

4e1e3cf

chosen-ox pushed a commit to chosen-ox/sglang that referenced this pull request Dec 22, 2024

Add integration with gemlite weight only quant (sgl-project#2528)

5773c63

chosen-ox pushed a commit to chosen-ox/sglang that referenced this pull request Dec 22, 2024

fix sgl-project#2528 (sgl-project#2541)

54a9d69

merrymercy mentioned this pull request Dec 26, 2024

torcho gemlite integration #2498

Closed

3 tasks

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

Add integration with gemlite weight only quant (sgl-project#2528)

b5a471d

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

fix sgl-project#2528 (sgl-project#2541)

d88ab41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add integration with gemlite weight only quant#2528

Add integration with gemlite weight only quant#2528
zhyncs merged 7 commits intosgl-project:mainfrom
jerryzh168:add-gemlite

jerryzh168 commented Dec 19, 2024

Uh oh!

zhyncs commented Dec 19, 2024

Uh oh!

Uh oh!

jerryzh168 commented Dec 19, 2024

Uh oh!

jerryzh168 commented Dec 19, 2024

Uh oh!

jerryzh168 commented Dec 19, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jerryzh168 commented Dec 19, 2024

Motivation

Modifications

Checklist

Uh oh!

zhyncs commented Dec 19, 2024

Uh oh!

Uh oh!

jerryzh168 commented Dec 19, 2024

Uh oh!

jerryzh168 commented Dec 19, 2024

Uh oh!

jerryzh168 commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jerryzh168 commented Dec 19, 2024 •

edited

Loading