Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
91eff7d
Add prompt processing batch and generation batch
angeloskath Mar 18, 2026
9462b92
Fix ArraysCache merge of empty arrays
angeloskath Mar 21, 2026
1a555b7
Start a BatchGenerator2
angeloskath Mar 23, 2026
2154d16
Fix various bugs and add generation batch extend
angeloskath Mar 23, 2026
db44550
Fix a couple bugs add types and docstrings
angeloskath Mar 23, 2026
38c31ae
Add stats and use it in batch_generate
angeloskath Mar 25, 2026
baa8c4e
Add cache extraction and time reporting for the benchmark
angeloskath Mar 25, 2026
a36975a
Remove the original batch generator
angeloskath Mar 25, 2026
366cbde
Fix the generate tests
angeloskath Mar 25, 2026
f5e5745
Fix rotating cache merge with empty and merge with full
angeloskath Mar 26, 2026
b8a5334
Add per sequence stop matcher
angeloskath Mar 26, 2026
6e26040
Remove forgotten pdb
angeloskath Mar 26, 2026
dca3e4c
Add remove and prompt_cache_nbytes
angeloskath Mar 26, 2026
7a2b273
Change the SequenceMatcher to a full on state machine
angeloskath Mar 26, 2026
6225dde
Fix empty stop tokens
angeloskath Mar 26, 2026
d02eb3c
Fix max tokens handling in batch_generate
angeloskath Mar 26, 2026
e85cc9d
Start transitioning the server to the new APIs
angeloskath Mar 27, 2026
972027a
Move _serve_single to the state machine
angeloskath Mar 27, 2026
d19333d
Running server
angeloskath Mar 28, 2026
ef0df36
Small cleanup
angeloskath Mar 28, 2026
c61d0f8
Ensure that we insert a copy of the list in the cache
angeloskath Mar 28, 2026
c53567f
Chat templates require user message
angeloskath Mar 28, 2026
e24c954
Fix the in reasoning test
angeloskath Mar 28, 2026
f49b34e
Fix the initial state
angeloskath Mar 28, 2026
6b1a033
Change the copy for prompt batch split
angeloskath Mar 28, 2026
61fc808
Fixes
angeloskath Mar 28, 2026
efb30e2
Fix qwen 3.5
angeloskath Mar 28, 2026
c2783f3
Fix batched gated delta net
angeloskath Mar 28, 2026
bef54c3
Remove unused batch dataclass
angeloskath Mar 28, 2026
da0b679
Small refactor of server methods
angeloskath Mar 29, 2026
4154a53
Stop generation or prompt processing on disconnect
angeloskath Mar 29, 2026
401b86e
Fix batched deepseek_v32 and GLM
angeloskath Mar 30, 2026
d4e898c
Add cache type logging and fix segment type
angeloskath Mar 30, 2026
ee44cd2
Handle edge case when uid both removed and finished
angeloskath Mar 30, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions mlx_lm/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,10 +148,13 @@ def batch_bench():
for i in range(args.num_trials):
if args.delay > 0:
time.sleep(args.delay)
tic = time.perf_counter()
response = _bench()
toc = time.perf_counter()
responses.append(response)
results = [(k, getattr(response, k)) for k in report_keys]
results = [f"{k}={v:.3f}" for k, v in results]
results.append(f"total_time={toc - tic:.3f}")
rprint(f"Trial {i+1}: " + ", ".join(results))

def avg(k):
Expand Down
2 changes: 1 addition & 1 deletion mlx_lm/examples/batch_generate_response.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@

# Set `verbose=True` to see generation statistics
result = batch_generate(
model, tokenizer, prompts, verbose=False, return_prompt_caches=True
model, tokenizer, prompts, verbose=False, return_prompt_caches=True, max_tokens=2048
)
print(result.texts[-1])

Expand Down
Loading
Loading