v0.2.0
Major Changes
- Support JetStream MaxText inference on Cloud TPU VM
- Support JetStream Pytorch inference on Cloud TPU VM
- Support Continuous Batching with interleaved mode in JetStream
- Support online serving benchmarking
What's Changed
- Add unit tests CI github action by @JoeZijunZhou in #1
- Refine thread in orchestrator by @JoeZijunZhou in #2
- Optimize maximum threads to saturate decoding capacity by @JoeZijunZhou in #3
- Add benchmarks maximum threads config by @JoeZijunZhou in #4
- First support necessary for MaxText by @rwitten in #5
- Support gracefully stopping orchestrator and server by @JoeZijunZhou in #6
- Save request outputs and add eval accuracy support by @FanhaiLu1 in #8
- Use parameter based num as inference request max output length by @FanhaiLu1 in #10
- Fix output token drop issue by @JoeZijunZhou in #9
- Add option to warm up by @qihqi in #11
- Replace token_list with generated_text in saved outputs by @FanhaiLu1 in #12
- Refine requester util by @JoeZijunZhou in #15
- Adds filtering for sharegpt based on conversation starter. by @patemotter in #17
- Allows more requests than available data. by @patemotter in #19
- Fix starvation with async server and interleaving optimization by @JoeZijunZhou in #13
- Add Token util unit test by @FanhaiLu1 in #20
- Fix llama2 decode bug in tokenizer by @FanhaiLu1 in #22
- Fix whitespace replacement bug by @FanhaiLu1 in #24
- Update benchmark to run openorca dataset by @morgandu in #21
- Add model ckpt conversion and AQT scripts for JetStream MaxText Serving by @JoeZijunZhou in #23
- Refactor to sample before tokenize by @morgandu in #26
- Update ckpt conversion scripts by @JoeZijunZhou in #25
- move tokenizer model to third party llama2 by @FanhaiLu1 in #27
- Support JetStream MaxText user guide by @JoeZijunZhou in #28
- Enable pylint linter and pyink formatter by @JoeZijunZhou in #29
- Update README by @JoeZijunZhou in #30
- Release v0.2.0 by @JoeZijunZhou in #31
New Contributors
- @JoeZijunZhou made their first contribution in #1
- @rwitten made their first contribution in #5
- @FanhaiLu1 made their first contribution in #8
- @qihqi made their first contribution in #11
- @patemotter made their first contribution in #17
- @morgandu made their first contribution in #21
Full Changelog: https://github.com/google/JetStream/commits/v0.2.0