Support llama3 #64

bhavya01 · 2024-05-02T00:12:52Z

Tested with run_interactive.py

python run_interactive.py --size=8b --model=llama-3 --batch_size=128 --max_cache_length=2048 --quantize_weights=$quantize --quantize_kv_cache=$quantize --checkpoint_path=$output_ckpt_dir --tokenizer_path=$tokenizer_path

Also ran the benchmark for llama-2 on tpu v4-8 and got the following numbers:

Successful requests: 1999
Benchmark duration: 393.904737 s
Total input tokens: 220485
Total generated tokens: 608985
Request throughput: 5.07 requests/s
Input token throughput: 559.74 tokens/s
Output token throughput: 1546.02 tokens/s
Mean TTFT: 284112.00 ms
Median TTFT: 284544.72 ms
P99 TTFT: 368924.02 ms
Mean TPOT: 5756.67 ms
Median TPOT: 1095.59 ms
P99 TPOT: 109265.00 ms

Need to run the benchmark script in Jetstream repo to get the metrics for llama-3

FanhaiLu1 · 2024-05-02T16:47:06Z

Thanks for adding llama3 support! Accuracy is critical, can you share the output result from both llama2 and llama3 from run_interactive?

bhavya01 · 2024-05-02T20:01:54Z

Thanks for adding llama3 support! Accuracy is critical, can you share the output result from both llama2 and llama3 from run_interactive?

This is the output for both the models: https://gist.github.com/bhavya01/40a344e671a2e5dde980f163141545db

FanhaiLu1 · 2024-05-02T21:35:29Z

Thanks for adding llama3 support! Accuracy is critical, can you share the output result from both llama2 and llama3 from run_interactive?

This is the output for both the models: https://gist.github.com/bhavya01/40a344e671a2e5dde980f163141545db

Can you use the output without this PR as base to compare (it's hard to know the quality dropped or not without baseline comparison)? If possible, can you do both base and test without quantization?

bhavya01 · 2024-05-02T23:03:48Z

Thanks for adding llama3 support! Accuracy is critical, can you share the output result from both llama2 and llama3 from run_interactive?

This is the output for both the models: https://gist.github.com/bhavya01/40a344e671a2e5dde980f163141545db

Can you use the output without this PR as base to compare (it's hard to know the quality dropped or not without baseline comparison)? If possible, can you do both base and test without quantization?

Yes makes sense. I did the comparison for LLAMA2 without quantization and both the results seem weird. https://gist.github.com/bhavya01/660cd636d678f42a01501d093d63c2b1

With quantization, they both look pretty similar: I added llama2_before output in this gist: https://gist.github.com/bhavya01/40a344e671a2e5dde980f163141545db

FanhaiLu1

Thanks for adding base vs test comparison for bfloat16 and int8 quantization. Looks good to me now.

Please fixed the checks errors and feel free to merge after that.

bhavya01 · 2024-05-03T00:46:05Z

The unit tests are failing as we test against Jetstream v0.2.0. We should have a new Jetstream release this week after which these tests will be fixed.

FanhaiLu1 · 2024-05-03T02:21:48Z

The unit tests are failing as we test against Jetstream v0.2.0. We should have a new Jetstream release this week after which these tests will be fixed.

@JoeZijunZhou HI Zijun, could you let us know when you plan to create a new Jetstream release? @bhavya01 Giving current test status, we need to tag latest JetStream release before submit this PR.

JoeZijunZhou

Thank you @bhavya01 and @FanhaiLu1 ! Here is the release for JetStream: AI-Hypercomputer/JetStream#72

…into llama3

Support llama3

fb617dd

bhavya01 requested a review from qihqi May 2, 2024 00:13

bhavya01 added 7 commits May 2, 2024 00:34

Merge branch 'main' into llama3

7ed53ea

Sync with main branch

78c95e2

Fix CI

7a32006

fix linting

68bd3f8

Fix pyink issues

e6e25ee

fix run_offline script

8ba6120

Fix pyink

ac11b4a

FanhaiLu1 self-requested a review May 2, 2024 16:49

Merge branch 'main' into llama3

5c4fa86

Fix after merging main

c5aaae5

FanhaiLu1 approved these changes May 3, 2024

View reviewed changes

FanhaiLu1 requested a review from JoeZijunZhou May 3, 2024 02:18

JoeZijunZhou approved these changes May 3, 2024

View reviewed changes

bhavya01 added 5 commits May 3, 2024 23:19

Update jetstream version in install_everything.sh

4748ffb

Merge branch 'main' into llama3

bb44c8d

Fix unit tests

5d40ebf

Merge branch 'llama3' of https://github.com/google/jetstream-pytorch …

11d9c62

…into llama3

Fix test

f556956

bhavya01 merged commit 137eb47 into main May 3, 2024
3 checks passed

bhavya01 deleted the llama3 branch May 3, 2024 23:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support llama3 #64

Support llama3 #64

bhavya01 commented May 2, 2024

FanhaiLu1 commented May 2, 2024

bhavya01 commented May 2, 2024

FanhaiLu1 commented May 2, 2024

bhavya01 commented May 2, 2024 •

edited

Loading

FanhaiLu1 left a comment

bhavya01 commented May 3, 2024

FanhaiLu1 commented May 3, 2024

JoeZijunZhou left a comment

Support llama3 #64

Support llama3 #64

Conversation

bhavya01 commented May 2, 2024

FanhaiLu1 commented May 2, 2024

bhavya01 commented May 2, 2024

FanhaiLu1 commented May 2, 2024

bhavya01 commented May 2, 2024 • edited Loading

FanhaiLu1 left a comment

Choose a reason for hiding this comment

bhavya01 commented May 3, 2024

FanhaiLu1 commented May 3, 2024

JoeZijunZhou left a comment

Choose a reason for hiding this comment

bhavya01 commented May 2, 2024 •

edited

Loading