Added cpu support for llama generate.py/eval.py #1307

jainapurva · 2024-11-18T23:52:29Z

Updated the cuda dependencies for memory profiling, to make script compatible with cpu.

Test plan: Ran script locally on Mac laptop

Benchmarks:

python torchao/_models/llama/generate.py --checkpoint_path checkpoints/meta-llama/Llama-3.2-3B/model.pth -q int8wo

Time for inference: 146.36 sec total, 1.37 tokens/sec
Average tokens/sec: 1.38
Average Bandwidth: 4.43 GB/s
Peak Memory Usage: 0.00 GB
Model Size: 3.22 GB

pytorch-bot · 2024-11-18T23:52:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1307

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

✅ No Failures

As of commit 569f0b5 with merge base 6234116 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168

great, thanks! can you also paste result of testing on some quant options, like int8wo?

jerryzh168 · 2024-11-19T22:45:08Z

@jainapurva looks good, I think we can merge now?

Setting numpy version to be the range required by gguf: https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/pyproject.toml

* add pp_dim, distributed, num_gpus, num_nodes as cmd line args * add tp_dim * add elastic_launch * working, can now launch from cli * Remove numpy < 2.0 pin to align with pytorch (pytorch#1301) Fix pytorch#1296 Align with https://github.com/pytorch/pytorch/blame/main/requirements.txt#L5 * Update torchtune pin to 0.4.0-dev20241010 (pytorch#1300) Co-authored-by: vmpuri <[email protected]> * Unbreak gguf util CI job by fixing numpy version (pytorch#1307) Setting numpy version to be the range required by gguf: https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/pyproject.toml * Remove apparently-unused import torchvision in model.py (pytorch#1305) Co-authored-by: vmpuri <[email protected]> * remove global var for tokenizer type + patch tokenizer to allow list of sequences * make pp tp visible in interface * Add llama 3.1 to dist_run.py * [WIP] Move dist inf into its own generator * Add initial generator interface to dist inference * Added generate method and placeholder scheduler * use prompt parameter for dist generation * Enforce tp>=2 * Build tokenizer from TokenizerArgs * Disable torchchat format + constrain possible models for distributed * disable calling dist_run.py directly for now * Restore original dist_run.py for now * disable _maybe_parallelize_model again * Reenable arg.model_name in dist_run.py * Use singleton logger instead of print in generate * Address PR comments; try/expect in launch_dist_inference; added comments --------- Co-authored-by: lessw2020 <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: vmpuri <[email protected]> Co-authored-by: vmpuri <[email protected]> Co-authored-by: Scott Wolchok <[email protected]>

Added cpu support

50279e9

jainapurva added the topic: for developers Use this tag if this PR is mainly developer facing label Nov 18, 2024

jainapurva requested a review from jerryzh168 November 18, 2024 23:52

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 18, 2024

jerryzh168 approved these changes Nov 19, 2024

View reviewed changes

Eval fixes

569f0b5

jainapurva marked this pull request as ready for review November 20, 2024 17:34

jainapurva merged commit d224653 into main Nov 20, 2024
18 checks passed

sunjiweiswift pushed a commit to sunjiweiswift/ao that referenced this pull request Nov 25, 2024

Added cpu support for llama generate.py/eval.py (pytorch#1307)

956b8a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added cpu support for llama generate.py/eval.py #1307

Added cpu support for llama generate.py/eval.py #1307

jainapurva commented Nov 18, 2024 •

edited

Loading

pytorch-bot bot commented Nov 18, 2024 •

edited

Loading

jerryzh168 left a comment

jerryzh168 commented Nov 19, 2024

Added cpu support for llama generate.py/eval.py #1307

Added cpu support for llama generate.py/eval.py #1307

Conversation

jainapurva commented Nov 18, 2024 • edited Loading

pytorch-bot bot commented Nov 18, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1307

❗ 1 Active SEVs

✅ No Failures

jerryzh168 left a comment

Choose a reason for hiding this comment

jerryzh168 commented Nov 19, 2024

jainapurva commented Nov 18, 2024 •

edited

Loading

pytorch-bot bot commented Nov 18, 2024 •

edited

Loading