Skip to content
Merged

Add CB #38085

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
4d2c0f9
stash for now
ArthurZucker Jan 13, 2025
1dbce45
initial commit
ArthurZucker Jan 15, 2025
123ea1f
small updated
ArthurZucker Jan 15, 2025
ae310b2
up
ArthurZucker Jan 15, 2025
b89d03b
up
ArthurZucker Jan 15, 2025
db898a2
works!
ArthurZucker Jan 16, 2025
fe77659
nits and fixes
ArthurZucker Jan 16, 2025
1fbff28
don't loop too much
ArthurZucker Jan 16, 2025
7056401
finish working example
ArthurZucker Jan 16, 2025
4761c8d
update
ArthurZucker Jan 16, 2025
5639730
fix the small freeblocks issue
ArthurZucker Jan 16, 2025
7aba0a0
feat: stream inputs to continuous batch
McPatate Apr 18, 2025
ade3159
fix: update attn from `eager` to `sdpa`
McPatate Apr 18, 2025
b18e8f7
refactor: fmt
McPatate Apr 18, 2025
fadfb64
refactor: cleanup unnecessary code
McPatate Apr 23, 2025
9ef6e92
feat: add `update` fn to `PagedAttentionCache`
McPatate Apr 23, 2025
b0592cf
feat: broken optimal block size computation
McPatate Apr 23, 2025
c7484cc
fix: debugging invalid cache logic
McPatate Apr 24, 2025
a89c534
fix: attention mask
McPatate Apr 29, 2025
09f415a
refactor: use custom prompts for example
McPatate Apr 29, 2025
6c749b5
feat: add streaming output
McPatate Apr 30, 2025
ef809bf
fix: prefill split
McPatate May 1, 2025
f4c7602
fix: send decoded tokens when `prefilling_split` -> `decoding`
McPatate May 1, 2025
45857de
refactor: move logic to appropriate parent class
McPatate May 1, 2025
8629a5e
fix: remove truncation as we split prefilling anyways
McPatate May 2, 2025
93a1016
feat: add paged attention forward
McPatate May 8, 2025
bf03fa3
push Ggraoh>
ArthurZucker May 8, 2025
3d57cc3
add paged sdpa
ArthurZucker May 9, 2025
0e8b1f3
Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
ArthurZucker May 9, 2025
768788f
update
ArthurZucker May 9, 2025
899e2c7
btter mps defaults
ArthurZucker May 9, 2025
4b6e9b3
feat: add progress bar for `generate_batch`
McPatate May 9, 2025
476621e
feat: add opentelemetry metrics (ttft + batch fill %age)
McPatate May 12, 2025
8a201e2
feat: add tracing
McPatate May 12, 2025
5c859ad
Add cuda graphs (#38059)
ArthurZucker May 12, 2025
0fb48e8
revert llama changes
ArthurZucker May 12, 2025
e2b4a89
fix merge conflicts
ArthurZucker May 12, 2025
967a084
fix: tracing and metrics
McPatate May 12, 2025
30cf2f8
my updates
ArthurZucker May 12, 2025
ffb7c41
update script default values
ArthurZucker May 12, 2025
c52edd8
fix block allocation issue
ArthurZucker May 12, 2025
1f52f87
Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
ArthurZucker May 12, 2025
a535e53
fix prefill split attnetion mask
ArthurZucker May 13, 2025
b62086b
no bugs
ArthurZucker May 13, 2025
026b9ef
add paged eager
ArthurZucker May 13, 2025
917ca13
fix
ArthurZucker May 13, 2025
fa1bfa3
update
ArthurZucker May 13, 2025
a861b2d
style
ArthurZucker May 13, 2025
4010d07
feat: add pytorch traces
McPatate May 13, 2025
259c542
fix
ArthurZucker May 13, 2025
685a422
fix
ArthurZucker May 13, 2025
3401b19
refactor: remove pytorch profiler data
McPatate May 13, 2025
497e057
style
ArthurZucker May 14, 2025
a0874b8
Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
ArthurZucker May 14, 2025
a2f8bbe
nits
ArthurZucker May 14, 2025
be9f683
Merge branch 'main' of github.com:huggingface/transformers into feat/…
ArthurZucker May 14, 2025
ee81e51
cleanup
ArthurZucker May 14, 2025
fdf319a
draft test file
ArthurZucker May 14, 2025
0c8868c
fix
ArthurZucker May 14, 2025
6c2e01a
Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
ArthurZucker May 14, 2025
e0c6113
fix
ArthurZucker May 14, 2025
57a3ae7
fix paged and graphs
ArthurZucker May 14, 2025
1c67666
small renamings
ArthurZucker May 14, 2025
0fa7bb0
cleanups and push
ArthurZucker May 14, 2025
54dd8b7
refactor: move tracing and metrics logic to utils
McPatate May 14, 2025
624e00e
refactor: trace more blocks of code
McPatate May 15, 2025
c6d8168
nits
ArthurZucker May 16, 2025
562f2d3
nits
ArthurZucker May 16, 2025
69f307d
Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
ArthurZucker May 16, 2025
3cf8e08
update
ArthurZucker May 16, 2025
86762a6
to profile or not to profile
ArthurZucker May 16, 2025
6ac6000
refactor: create new output object
McPatate May 16, 2025
d47ab92
causal by default
ArthurZucker May 18, 2025
eff9d66
Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
ArthurZucker May 18, 2025
294ed69
cleanup but generations are still off for IDK what reason
ArthurZucker May 18, 2025
e4abe36
simplifications but not running still
ArthurZucker May 18, 2025
9d79be9
this does work.
ArthurZucker May 19, 2025
c719293
small quality of life updates
ArthurZucker May 19, 2025
ad78b20
nits
ArthurZucker May 19, 2025
b080295
updaet
ArthurZucker May 19, 2025
3d9045e
fix the scheduler
ArthurZucker May 19, 2025
afbf7c8
fix warning
ArthurZucker May 19, 2025
7f80c03
ol
ArthurZucker May 19, 2025
9be5439
fully fixed
ArthurZucker May 19, 2025
268fa52
nits
ArthurZucker May 19, 2025
27b550c
different generation parameters
ArthurZucker May 19, 2025
0b5c1e9
nice
ArthurZucker May 19, 2025
aba184e
just style
ArthurZucker May 19, 2025
de20a84
feat: add cache memory usage
McPatate May 19, 2025
71616eb
feat: add kv cache free memory
McPatate May 19, 2025
3d1ed43
feat: add active/waiting count & req latency
McPatate May 19, 2025
1c0ef44
do the sampling
ArthurZucker May 20, 2025
938a012
Merge branch 'feat/stream_inputs_to_continuous_batch' of github.com:h…
ArthurZucker May 20, 2025
6dad2d3
fix: synchronize CUDA only if available and improve error handling in…
ArthurZucker May 20, 2025
b05c857
fix on mps
ArthurZucker May 20, 2025
ff5b08a
feat: add dashboard & histogram buckets
McPatate May 20, 2025
5f619da
perf: improve waiting reqs data structures
McPatate May 20, 2025
0b50324
attempt to compile, but we should only do it on mps AFAIK
ArthurZucker May 21, 2025
6b9a107
feat: decouple scheduling logic
McPatate May 21, 2025
ab3d348
just a draft
ArthurZucker May 22, 2025
cca3009
c;eanup and fixup
ArthurZucker May 22, 2025
0039ba1
optional
ArthurZucker May 22, 2025
2243fad
style
ArthurZucker May 22, 2025
206f1fa
update
ArthurZucker May 22, 2025
c537d01
update
ArthurZucker May 22, 2025
3d4709c
remove the draft documentation
ArthurZucker May 22, 2025
0e34470
fix import as well
ArthurZucker May 22, 2025
6f4ecd3
update
ArthurZucker May 22, 2025
db22dd5
fix the test
ArthurZucker May 22, 2025
5a76a27
style doomed
ArthurZucker May 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions examples/metrics-monitoring/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Metrics Monitoring

## Continuous Batching Metrics in Transformers

Loading