Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added cpu support for llama generate.py/eval.py #1307

Merged
merged 2 commits into from
Nov 20, 2024
Merged

Conversation

jainapurva
Copy link
Contributor

@jainapurva jainapurva commented Nov 18, 2024

Updated the cuda dependencies for memory profiling, to make script compatible with cpu.

Test plan: Ran script locally on Mac laptop

Benchmarks:

python torchao/_models/llama/generate.py --checkpoint_path checkpoints/meta-llama/Llama-3.2-3B/model.pth -q int8wo

  • Time for inference: 146.36 sec total, 1.37 tokens/sec
  • Average tokens/sec: 1.38
  • Average Bandwidth: 4.43 GB/s
  • Peak Memory Usage: 0.00 GB
  • Model Size: 3.22 GB

@jainapurva jainapurva added the topic: for developers Use this tag if this PR is mainly developer facing label Nov 18, 2024
Copy link

pytorch-bot bot commented Nov 18, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1307

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 569f0b5 with merge base 6234116 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 18, 2024
Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks! can you also paste result of testing on some quant options, like int8wo?

@jerryzh168
Copy link
Contributor

@jainapurva looks good, I think we can merge now?

@jainapurva jainapurva marked this pull request as ready for review November 20, 2024 17:34
@jainapurva jainapurva merged commit d224653 into main Nov 20, 2024
18 checks passed
sunjiweiswift pushed a commit to sunjiweiswift/ao that referenced this pull request Nov 25, 2024
yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024
yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024
* add pp_dim, distributed, num_gpus, num_nodes as cmd line args

* add tp_dim

* add elastic_launch

* working, can now launch from cli

* Remove numpy < 2.0 pin to align with pytorch (pytorch#1301)

Fix pytorch#1296

Align with https://github.com/pytorch/pytorch/blame/main/requirements.txt#L5

* Update torchtune pin to 0.4.0-dev20241010 (pytorch#1300)

Co-authored-by: vmpuri <[email protected]>

* Unbreak gguf util CI job by fixing numpy version (pytorch#1307)

Setting numpy version to be the range required by gguf: https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/pyproject.toml

* Remove apparently-unused import torchvision in model.py (pytorch#1305)

Co-authored-by: vmpuri <[email protected]>

* remove global var for tokenizer type + patch tokenizer to allow list of sequences

* make pp tp visible in interface

* Add llama 3.1 to dist_run.py

* [WIP] Move dist inf into its own generator

* Add initial generator interface to dist inference

* Added generate method and placeholder scheduler

* use prompt parameter for dist generation

* Enforce tp>=2

* Build tokenizer from TokenizerArgs

* Disable torchchat format + constrain possible models for distributed

* disable calling dist_run.py directly for now

* Restore original dist_run.py for now

* disable _maybe_parallelize_model again

* Reenable arg.model_name in dist_run.py

* Use singleton logger instead of print in generate

* Address PR comments; try/expect in launch_dist_inference; added comments

---------

Co-authored-by: lessw2020 <[email protected]>
Co-authored-by: Mengwei Liu <[email protected]>
Co-authored-by: vmpuri <[email protected]>
Co-authored-by: vmpuri <[email protected]>
Co-authored-by: Scott Wolchok <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: for developers Use this tag if this PR is mainly developer facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants