Releases: AI-Hypercomputer/JetStream
Releases · AI-Hypercomputer/JetStream
v0.2.2
Key Changes
- Enable observability in JetStream Server (prometheus metrics)
- Enable JAX profiler support on single-host JetStream Server
- Support both text and token ids I/O for JetStream Decode API
- Add health check API
- Support MLPerf evaluation
- Enable JetStream Server E2E tests
- Increase unit test coverage (>=96%)
What's Changed
- Accuracy eval mlperf by @jwyang-google in #76
- Add metadata metrics by @yeandy in #77
- Fix pad_tokens function description by @FanhaiLu1 in #80
- Prometheus Metrics by @Bslabe123 in #71
- Update JetStream grpc proto to support I/O with text and token ids by @JoeZijunZhou in #78
- Update benchmark script to easily test llama-3 by @bhavya01 in #83
- Unit test coverage cleanup by @JoeZijunZhou in #81
- Allow tokenizer to customize stop_tokens by @qihqi in #84
- Decode Batch Percentage Metrics/Improved Scraping by @Bslabe123 in #82
- Bump requests from 2.31.0 to 2.32.0 in the pip group across 1 directory by @dependabot in #86
- Add profiling support and update docs by @JoeZijunZhou in #85
- Add ray disaggregated serving support by @FanhaiLu1 in #87
- Ensure server warmup before benchmark by @JoeZijunZhou in #91
- Add healthcheck support for JetStream by @vivianrwu in #90
- Add JetStream E2E test CI by @JoeZijunZhou in #89
- Release v0.2.2 by @JoeZijunZhou in #95
New Contributors
- @jwyang-google made their first contribution in #76
- @Bslabe123 made their first contribution in #71
- @vivianrwu made their first contribution in #90
Full Changelog: v0.2.1...v0.2.2
v0.2.1
Key Changes
- Support Llama3 tokenizer
- JetStream Tokenizer refactor
- Disaggregation preparation work
What's Changed
- add sample_idx in InputRequest for debugging by @morgandu in #32
- Update README.md with user guides by @JoeZijunZhou in #34
- Update README.md with PT user guide by @JoeZijunZhou in #35
- Reorganize unit tests and update CICD by @JoeZijunZhou in #37
- Add badges for JetStream by @JoeZijunZhou in #38
- Bump idna from 3.6 to 3.7 by @dependabot in #39
- Reformat benchmark metrics by @yeandy in #42
- Update server host default value by @JoeZijunZhou in #43
- Refactor readme by @FanhaiLu1 in #41
- Add missing Documentation by @FanhaiLu1 in #47
- Update README.md to fix broken link by @charbull in #50
- Add np padded token support by @FanhaiLu1 in #49
- Format token utils and test by @FanhaiLu1 in #51
- Align Tokenizer in JetStream by @JoeZijunZhou in #40
- Do nothing for nd array in copy_to_host_async by @FanhaiLu1 in #52
- Add jax_padding support driver and server lib by @FanhaiLu1 in #54
- Update maxtext user guide by @JoeZijunZhou in #56
- Fix benchmark script type issue by @JoeZijunZhou in #59
- Fix requester flag default value by @JoeZijunZhou in #60
- Fix float division by zero in benchmark by @FanhaiLu1 in #62
- Register IFRT proxy backend when proxy is defined in the jax_platforms by @zhihaoshan-google in #63
- Add an abstract class for Tokenizer by @bhavya01 in #53
- refactor slice_to_num_chips to adapt to Cloud config by @zhihaoshan-google in #65
- Support llama3 tokenizer by @bhavya01 in #67
- Prerequisite work for supporting disaggregation: by @zhihaoshan-google in #68
- Create init.py in Jetstream/third_party by @bhavya01 in #69
- Add tokenize_and_pad function to backward compatible by @FanhaiLu1 in #70
- Release v0.2.1 by @JoeZijunZhou in #72
- Bump tqdm from 4.66.1 to 4.66.3 in the pip group across 1 directory by @dependabot in #73
- Release v0.2.1 with docs update by @JoeZijunZhou in #74
New Contributors
- @dependabot made their first contribution in #39
- @yeandy made their first contribution in #42
- @charbull made their first contribution in #50
- @zhihaoshan-google made their first contribution in #63
- @bhavya01 made their first contribution in #53
Full Changelog: v0.2.0...v0.2.1
v0.2.0
Major Changes
- Support JetStream MaxText inference on Cloud TPU VM
- Support JetStream Pytorch inference on Cloud TPU VM
- Support Continuous Batching with interleaved mode in JetStream
- Support online serving benchmarking
What's Changed
- Add unit tests CI github action by @JoeZijunZhou in #1
- Refine thread in orchestrator by @JoeZijunZhou in #2
- Optimize maximum threads to saturate decoding capacity by @JoeZijunZhou in #3
- Add benchmarks maximum threads config by @JoeZijunZhou in #4
- First support necessary for MaxText by @rwitten in #5
- Support gracefully stopping orchestrator and server by @JoeZijunZhou in #6
- Save request outputs and add eval accuracy support by @FanhaiLu1 in #8
- Use parameter based num as inference request max output length by @FanhaiLu1 in #10
- Fix output token drop issue by @JoeZijunZhou in #9
- Add option to warm up by @qihqi in #11
- Replace token_list with generated_text in saved outputs by @FanhaiLu1 in #12
- Refine requester util by @JoeZijunZhou in #15
- Adds filtering for sharegpt based on conversation starter. by @patemotter in #17
- Allows more requests than available data. by @patemotter in #19
- Fix starvation with async server and interleaving optimization by @JoeZijunZhou in #13
- Add Token util unit test by @FanhaiLu1 in #20
- Fix llama2 decode bug in tokenizer by @FanhaiLu1 in #22
- Fix whitespace replacement bug by @FanhaiLu1 in #24
- Update benchmark to run openorca dataset by @morgandu in #21
- Add model ckpt conversion and AQT scripts for JetStream MaxText Serving by @JoeZijunZhou in #23
- Refactor to sample before tokenize by @morgandu in #26
- Update ckpt conversion scripts by @JoeZijunZhou in #25
- move tokenizer model to third party llama2 by @FanhaiLu1 in #27
- Support JetStream MaxText user guide by @JoeZijunZhou in #28
- Enable pylint linter and pyink formatter by @JoeZijunZhou in #29
- Update README by @JoeZijunZhou in #30
- Release v0.2.0 by @JoeZijunZhou in #31
New Contributors
- @JoeZijunZhou made their first contribution in #1
- @rwitten made their first contribution in #5
- @FanhaiLu1 made their first contribution in #8
- @qihqi made their first contribution in #11
- @patemotter made their first contribution in #17
- @morgandu made their first contribution in #21
Full Changelog: https://github.com/google/JetStream/commits/v0.2.0