Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
c54b324
docs
eitanturok Sep 5, 2025
692a066
cleaner
eitanturok Sep 5, 2025
263670f
update
eitanturok Sep 5, 2025
ba7f0db
download pruned vocab
eitanturok Sep 5, 2025
b046119
add logger; fix load_draft_vocab_pruned
eitanturok Sep 5, 2025
0c59e9d
get model device
eitanturok Sep 5, 2025
9880e78
override the lm head
eitanturok Sep 5, 2025
b94836c
grrr spelling mistake
eitanturok Sep 5, 2025
9ebe01f
good print statements
eitanturok Sep 7, 2025
a0ace47
pruned drafter has correct dims
eitanturok Sep 7, 2025
f5c6d76
clean up
eitanturok Sep 7, 2025
a41bb95
pruned draft vocab works with no compilation
eitanturok Sep 7, 2025
10863c3
remove ic
eitanturok Sep 8, 2025
6a45511
start test
eitanturok Sep 8, 2025
e5b375c
fix test imports
eitanturok Sep 8, 2025
5852deb
test setup
eitanturok Sep 8, 2025
8f75092
comment out asserts
eitanturok Sep 8, 2025
81138d6
init benchmark
eitanturok Sep 8, 2025
e714f9f
benchmark
eitanturok Sep 8, 2025
9d82d63
rmv max_model_len, add draft_vocab_pruned
eitanturok Sep 8, 2025
660b5eb
cleanup cases
eitanturok Sep 8, 2025
db81b8b
remove ic prints
eitanturok Sep 8, 2025
2c40099
remove print
eitanturok Sep 8, 2025
16d02da
record speed stats
eitanturok Sep 8, 2025
8cacd8c
pretty printing
eitanturok Sep 8, 2025
b895a29
fix default SD
eitanturok Sep 8, 2025
3127f08
increase length, better print
eitanturok Sep 8, 2025
408d583
delete extra benchmark
eitanturok Sep 8, 2025
55ebb76
Merge branch 'main' into fr-spec
eitanturok Sep 9, 2025
92775db
clear cuda cache, better prints
eitanturok Sep 9, 2025
bc639bd
Merge remote-tracking branch 'origin/fr-spec' into fr-spec
eitanturok Sep 9, 2025
490cd5a
rename to prune ids
eitanturok Sep 9, 2025
890825b
factor out test
eitanturok Sep 9, 2025
974928d
tests do not work
eitanturok Sep 9, 2025
e6f6a49
spec decode tree now works?
eitanturok Sep 9, 2025
9e91f28
deepcopy works
eitanturok Sep 10, 2025
e0b725f
device works; deepcopy works
eitanturok Sep 10, 2025
3d60331
mock_get_model works
eitanturok Sep 10, 2025
7a3bfda
load model works
eitanturok Sep 10, 2025
a6c65b6
test prune
eitanturok Sep 10, 2025
3ff206a
some progress
eitanturok Sep 10, 2025
0e2207d
test_load_model works
eitanturok Sep 10, 2025
5281d35
load_model is cleaner
eitanturok Sep 10, 2025
e38226e
propose test passes
eitanturok Sep 10, 2025
6018aea
propose tree passes
eitanturok Sep 10, 2025
ad511b8
remove test file
eitanturok Sep 10, 2025
509520b
cleanup test_eagle
eitanturok Sep 10, 2025
3c0eb2c
clean up dlwonlad pruned vocab
eitanturok Sep 10, 2025
84cad8e
cleanup ic and load_pruned_vocab
eitanturok Sep 10, 2025
2a23fcc
add vocab_freq_dir path
eitanturok Sep 10, 2025
374f5ca
update
eitanturok Sep 10, 2025
acecfe4
cleanup
eitanturok Sep 11, 2025
3cd65fd
test pruning seperately
eitanturok Sep 11, 2025
c412b40
better prints
eitanturok Sep 11, 2025
fddaee1
cleaner printing
eitanturok Sep 11, 2025
3c5178d
removed unused param
eitanturok Sep 11, 2025
0cb1949
chatgpt has some good suggestions
eitanturok Sep 11, 2025
2b31e7e
even cleaner prints
eitanturok Sep 11, 2025
8595c99
cleanup tests
eitanturok Sep 11, 2025
0615e96
better tests
eitanturok Sep 11, 2025
14f6989
prune_ratio -> keep_threshold
eitanturok Sep 11, 2025
f5531a9
more progress
eitanturok Sep 11, 2025
e190edf
more progress
eitanturok Sep 11, 2025
1c1e94b
error checking for spec tree
eitanturok Sep 11, 2025
61610c9
update
eitanturok Sep 11, 2025
7a6d99e
batch size param
eitanturok Sep 12, 2025
02772aa
update
eitanturok Sep 12, 2025
638de46
best assert ever
eitanturok Sep 12, 2025
0a4df34
comment it out
eitanturok Sep 12, 2025
06bff37
cleaner
eitanturok Sep 14, 2025
c55ea95
print more stats
eitanturok Sep 14, 2025
1f8422e
cleaner print
eitanturok Sep 14, 2025
42e4973
update
eitanturok Sep 15, 2025
862d870
print target, drafter forward times
eitanturok Sep 15, 2025
cd8466a
more hlpeful prints
eitanturok Sep 15, 2025
777ead1
Remove prints
eitanturok Sep 16, 2025
42368eb
add seed; by default set to 0 but not is explicit
eitanturok Sep 16, 2025
233454e
print batch size stats
eitanturok Sep 16, 2025
83e73c3
more prints
eitanturok Sep 16, 2025
79ceff3
comment out prints
eitanturok Sep 16, 2025
b0f76bf
max_num_seqs arg instead of batch_size
eitanturok Sep 16, 2025
7907ab1
no spec decoding when num-spec-tokens==0
eitanturok Sep 16, 2025
0bcba47
Cleaner prints
eitanturok Sep 16, 2025
278c613
track forward_times
eitanturok Sep 16, 2025
34f1a4c
measure target, drafter forward times
eitanturok Sep 16, 2025
b05e5ec
better print
eitanturok Sep 16, 2025
b4a7d64
log everything
eitanturok Sep 17, 2025
bcf9510
vanilla prints stats now
eitanturok Sep 17, 2025
39d621f
remove extra import
eitanturok Sep 17, 2025
327f8ff
code for branch/depth spec token tree
eitanturok Sep 17, 2025
6996221
run some scripts
eitanturok Sep 17, 2025
6a726ae
update
eitanturok Sep 18, 2025
4bfd0bb
delete
eitanturok Sep 19, 2025
8019987
use less memory
eitanturok Sep 19, 2025
187791a
let's track the outputs
eitanturok Sep 19, 2025
d3d9d8d
better print
eitanturok Sep 30, 2025
26bc61c
more
eitanturok Sep 30, 2025
2e6aa0d
print more things
eitanturok Oct 9, 2025
48e5d93
more results
eitanturok Oct 9, 2025
1f2991a
handle branch/depth spec token tree better
eitanturok Oct 16, 2025
828f433
fix output printing
eitanturok Oct 23, 2025
344ece4
print better
eitanturok Oct 23, 2025
03f3b83
multi-turn inference works
eitanturok Oct 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
265 changes: 212 additions & 53 deletions examples/offline_inference/spec_decode.py

Large diffs are not rendered by default.

56 changes: 56 additions & 0 deletions outputs/20250919_175633/args.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
{"seed": "0"}
{"request_id_prefix": ""}
{"num_prompts": "100"}
{"dataset_name": "hf"}
{"no_stream": "False"}
{"dataset_path": "philschmid/mt-bench"}
{"custom_output_len": "256"}
{"custom_skip_chat_template": "True"}
{"spec_bench_output_len": "256"}
{"spec_bench_category": "None"}
{"sonnet_input_len": "550"}
{"sonnet_output_len": "150"}
{"sonnet_prefix_len": "200"}
{"sharegpt_output_len": "None"}
{"blazedit_min_distance": "0.0"}
{"blazedit_max_distance": "1.0"}
{"random_input_len": "1024"}
{"random_output_len": "128"}
{"random_range_ratio": "0.0"}
{"random_prefix_len": "0"}
{"random_batch_size": "1"}
{"random_mm_base_items_per_request": "1"}
{"random_mm_num_mm_items_range_ratio": "0.0"}
{"random_mm_limit_mm_per_prompt": "{'image': 255, 'video': 0}"}
{"random_mm_bucket_config": "{(256, 256, 1): 0.5, (720, 1280, 1): 0.5, (720, 1280, 16): 0.0}"}
{"hf_subset": "None"}
{"hf_split": "train"}
{"hf_name": "None"}
{"hf_output_len": "None"}
{"prefix_repetition_prefix_len": "256"}
{"prefix_repetition_suffix_len": "256"}
{"prefix_repetition_num_prefixes": "10"}
{"prefix_repetition_output_len": "128"}
{"method": "eagle"}
{"num_spec_tokens": "1"}
{"spec_token_tree": "None"}
{"spec_token_tree_depth": "None"}
{"spec_token_tree_branching": "None"}
{"prompt_lookup_max": "5"}
{"prompt_lookup_min": "2"}
{"tp": "1"}
{"enforce_eager": "False"}
{"enable_chunked_prefill": "False"}
{"temp": "0"}
{"top_p": "1.0"}
{"top_k": "-1"}
{"print_output": "False"}
{"max_num_seqs": "1"}
{"output_len": "256"}
{"model_dir": "None"}
{"eagle_dir": "None"}
{"custom_mm_prompts": "False"}
{"draft_vocab_frequency_path": "None"}
{"draft_vocab_frequency_keep_threshold": "None"}
{"compilation_config": "{\"level\": \"0\"}"}
{"endpoint_type": "openai-chat"}
19 changes: 19 additions & 0 deletions outputs/20250919_175633/stats.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{"input_tokens": 9468}
{"output_tokens": 21363}
{"input_time": 6.011072126999466}
{"output_time": 452.5689067610002}
{"total_time": 458.5799788879997}
{"drafter_prefill_forward_time": 0.000760874999969019}
{"target_prefill_forward_time": 0.02968344500004605}
{"prefill_forward_ratio": 0.025632974877674696}
{"drafter_decode_forward_time": 7.910397590991124}
{"target_decode_forward_time": 172.45367566900495}
{"decode_forward_ratio": 0.04586969550114876}
{"input_throughput": 1575.093394317017}
{"output_throughput": 47.203861513362234}
{"total_throughput": 67.2314567128757}
{"drafts": 12482}
{"draft_tokens": 12482}
{"draft_utilization_rate": 70.70181060727447}
{"accepted_tokens": 8825}
{"acceptance_length": 1.7070181060727447}
56 changes: 56 additions & 0 deletions outputs/20250919_181208/args.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
{"seed": "0"}
{"request_id_prefix": ""}
{"num_prompts": "100"}
{"dataset_name": "hf"}
{"no_stream": "False"}
{"dataset_path": "philschmid/mt-bench"}
{"custom_output_len": "256"}
{"custom_skip_chat_template": "True"}
{"spec_bench_output_len": "256"}
{"spec_bench_category": "None"}
{"sonnet_input_len": "550"}
{"sonnet_output_len": "150"}
{"sonnet_prefix_len": "200"}
{"sharegpt_output_len": "None"}
{"blazedit_min_distance": "0.0"}
{"blazedit_max_distance": "1.0"}
{"random_input_len": "1024"}
{"random_output_len": "128"}
{"random_range_ratio": "0.0"}
{"random_prefix_len": "0"}
{"random_batch_size": "1"}
{"random_mm_base_items_per_request": "1"}
{"random_mm_num_mm_items_range_ratio": "0.0"}
{"random_mm_limit_mm_per_prompt": "{'image': 255, 'video': 0}"}
{"random_mm_bucket_config": "{(256, 256, 1): 0.5, (720, 1280, 1): 0.5, (720, 1280, 16): 0.0}"}
{"hf_subset": "None"}
{"hf_split": "train"}
{"hf_name": "None"}
{"hf_output_len": "None"}
{"prefix_repetition_prefix_len": "256"}
{"prefix_repetition_suffix_len": "256"}
{"prefix_repetition_num_prefixes": "10"}
{"prefix_repetition_output_len": "128"}
{"method": "eagle"}
{"num_spec_tokens": "1"}
{"spec_token_tree": "None"}
{"spec_token_tree_depth": "None"}
{"spec_token_tree_branching": "None"}
{"prompt_lookup_max": "5"}
{"prompt_lookup_min": "2"}
{"tp": "1"}
{"enforce_eager": "False"}
{"enable_chunked_prefill": "False"}
{"temp": "0"}
{"top_p": "1.0"}
{"top_k": "-1"}
{"print_output": "False"}
{"max_num_seqs": "1"}
{"output_len": "256"}
{"model_dir": "None"}
{"eagle_dir": "None"}
{"custom_mm_prompts": "False"}
{"draft_vocab_frequency_path": "None"}
{"draft_vocab_frequency_keep_threshold": "None"}
{"compilation_config": "{\"level\": \"0\"}"}
{"endpoint_type": "openai-chat"}
56 changes: 56 additions & 0 deletions outputs/20250919_181227/args.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
{"seed": "0"}
{"request_id_prefix": ""}
{"num_prompts": "3"}
{"dataset_name": "hf"}
{"no_stream": "False"}
{"dataset_path": "philschmid/mt-bench"}
{"custom_output_len": "256"}
{"custom_skip_chat_template": "True"}
{"spec_bench_output_len": "256"}
{"spec_bench_category": "None"}
{"sonnet_input_len": "550"}
{"sonnet_output_len": "150"}
{"sonnet_prefix_len": "200"}
{"sharegpt_output_len": "None"}
{"blazedit_min_distance": "0.0"}
{"blazedit_max_distance": "1.0"}
{"random_input_len": "1024"}
{"random_output_len": "128"}
{"random_range_ratio": "0.0"}
{"random_prefix_len": "0"}
{"random_batch_size": "1"}
{"random_mm_base_items_per_request": "1"}
{"random_mm_num_mm_items_range_ratio": "0.0"}
{"random_mm_limit_mm_per_prompt": "{'image': 255, 'video': 0}"}
{"random_mm_bucket_config": "{(256, 256, 1): 0.5, (720, 1280, 1): 0.5, (720, 1280, 16): 0.0}"}
{"hf_subset": "None"}
{"hf_split": "train"}
{"hf_name": "None"}
{"hf_output_len": "None"}
{"prefix_repetition_prefix_len": "256"}
{"prefix_repetition_suffix_len": "256"}
{"prefix_repetition_num_prefixes": "10"}
{"prefix_repetition_output_len": "128"}
{"method": "eagle"}
{"num_spec_tokens": "1"}
{"spec_token_tree": "None"}
{"spec_token_tree_depth": "None"}
{"spec_token_tree_branching": "None"}
{"prompt_lookup_max": "5"}
{"prompt_lookup_min": "2"}
{"tp": "1"}
{"enforce_eager": "False"}
{"enable_chunked_prefill": "False"}
{"temp": "0"}
{"top_p": "1.0"}
{"top_k": "-1"}
{"print_output": "False"}
{"max_num_seqs": "1"}
{"output_len": "256"}
{"model_dir": "None"}
{"eagle_dir": "None"}
{"custom_mm_prompts": "False"}
{"draft_vocab_frequency_path": "None"}
{"draft_vocab_frequency_keep_threshold": "None"}
{"compilation_config": "{\"level\": \"0\"}"}
{"endpoint_type": "openai-chat"}
56 changes: 56 additions & 0 deletions outputs/20250919_181518/args.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
{"seed": "0"}
{"request_id_prefix": ""}
{"num_prompts": "3"}
{"dataset_name": "hf"}
{"no_stream": "False"}
{"dataset_path": "philschmid/mt-bench"}
{"custom_output_len": "256"}
{"custom_skip_chat_template": "True"}
{"spec_bench_output_len": "256"}
{"spec_bench_category": "None"}
{"sonnet_input_len": "550"}
{"sonnet_output_len": "150"}
{"sonnet_prefix_len": "200"}
{"sharegpt_output_len": "None"}
{"blazedit_min_distance": "0.0"}
{"blazedit_max_distance": "1.0"}
{"random_input_len": "1024"}
{"random_output_len": "128"}
{"random_range_ratio": "0.0"}
{"random_prefix_len": "0"}
{"random_batch_size": "1"}
{"random_mm_base_items_per_request": "1"}
{"random_mm_num_mm_items_range_ratio": "0.0"}
{"random_mm_limit_mm_per_prompt": "{'image': 255, 'video': 0}"}
{"random_mm_bucket_config": "{(256, 256, 1): 0.5, (720, 1280, 1): 0.5, (720, 1280, 16): 0.0}"}
{"hf_subset": "None"}
{"hf_split": "train"}
{"hf_name": "None"}
{"hf_output_len": "None"}
{"prefix_repetition_prefix_len": "256"}
{"prefix_repetition_suffix_len": "256"}
{"prefix_repetition_num_prefixes": "10"}
{"prefix_repetition_output_len": "128"}
{"method": "eagle"}
{"num_spec_tokens": "1"}
{"spec_token_tree": "None"}
{"spec_token_tree_depth": "None"}
{"spec_token_tree_branching": "None"}
{"prompt_lookup_max": "5"}
{"prompt_lookup_min": "2"}
{"tp": "1"}
{"enforce_eager": "False"}
{"enable_chunked_prefill": "False"}
{"temp": "0"}
{"top_p": "1.0"}
{"top_k": "-1"}
{"print_output": "False"}
{"max_num_seqs": "1"}
{"output_len": "256"}
{"model_dir": "None"}
{"eagle_dir": "None"}
{"custom_mm_prompts": "False"}
{"draft_vocab_frequency_path": "None"}
{"draft_vocab_frequency_keep_threshold": "None"}
{"compilation_config": "{\"level\": \"0\"}"}
{"endpoint_type": "openai-chat"}
56 changes: 56 additions & 0 deletions outputs/20250919_181951/args.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
{"seed": "0"}
{"request_id_prefix": ""}
{"num_prompts": "2"}
{"dataset_name": "hf"}
{"no_stream": "False"}
{"dataset_path": "philschmid/mt-bench"}
{"custom_output_len": "256"}
{"custom_skip_chat_template": "True"}
{"spec_bench_output_len": "256"}
{"spec_bench_category": "None"}
{"sonnet_input_len": "550"}
{"sonnet_output_len": "150"}
{"sonnet_prefix_len": "200"}
{"sharegpt_output_len": "None"}
{"blazedit_min_distance": "0.0"}
{"blazedit_max_distance": "1.0"}
{"random_input_len": "1024"}
{"random_output_len": "128"}
{"random_range_ratio": "0.0"}
{"random_prefix_len": "0"}
{"random_batch_size": "1"}
{"random_mm_base_items_per_request": "1"}
{"random_mm_num_mm_items_range_ratio": "0.0"}
{"random_mm_limit_mm_per_prompt": "{'image': 255, 'video': 0}"}
{"random_mm_bucket_config": "{(256, 256, 1): 0.5, (720, 1280, 1): 0.5, (720, 1280, 16): 0.0}"}
{"hf_subset": "None"}
{"hf_split": "train"}
{"hf_name": "None"}
{"hf_output_len": "None"}
{"prefix_repetition_prefix_len": "256"}
{"prefix_repetition_suffix_len": "256"}
{"prefix_repetition_num_prefixes": "10"}
{"prefix_repetition_output_len": "128"}
{"method": "eagle"}
{"num_spec_tokens": "1"}
{"spec_token_tree": "None"}
{"spec_token_tree_depth": "None"}
{"spec_token_tree_branching": "None"}
{"prompt_lookup_max": "5"}
{"prompt_lookup_min": "2"}
{"tp": "1"}
{"enforce_eager": "False"}
{"enable_chunked_prefill": "False"}
{"temp": "0"}
{"top_p": "1.0"}
{"top_k": "-1"}
{"print_output": "False"}
{"max_num_seqs": "1"}
{"output_len": "256"}
{"model_dir": "None"}
{"eagle_dir": "None"}
{"custom_mm_prompts": "False"}
{"draft_vocab_frequency_path": "None"}
{"draft_vocab_frequency_keep_threshold": "None"}
{"compilation_config": "{\"level\": \"0\"}"}
{"endpoint_type": "openai-chat"}
56 changes: 56 additions & 0 deletions outputs/20250919_182203/args.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
{"seed": "0"}
{"request_id_prefix": ""}
{"num_prompts": "2"}
{"dataset_name": "hf"}
{"no_stream": "False"}
{"dataset_path": "philschmid/mt-bench"}
{"custom_output_len": "256"}
{"custom_skip_chat_template": "True"}
{"spec_bench_output_len": "256"}
{"spec_bench_category": "None"}
{"sonnet_input_len": "550"}
{"sonnet_output_len": "150"}
{"sonnet_prefix_len": "200"}
{"sharegpt_output_len": "None"}
{"blazedit_min_distance": "0.0"}
{"blazedit_max_distance": "1.0"}
{"random_input_len": "1024"}
{"random_output_len": "128"}
{"random_range_ratio": "0.0"}
{"random_prefix_len": "0"}
{"random_batch_size": "1"}
{"random_mm_base_items_per_request": "1"}
{"random_mm_num_mm_items_range_ratio": "0.0"}
{"random_mm_limit_mm_per_prompt": "{'image': 255, 'video': 0}"}
{"random_mm_bucket_config": "{(256, 256, 1): 0.5, (720, 1280, 1): 0.5, (720, 1280, 16): 0.0}"}
{"hf_subset": "None"}
{"hf_split": "train"}
{"hf_name": "None"}
{"hf_output_len": "None"}
{"prefix_repetition_prefix_len": "256"}
{"prefix_repetition_suffix_len": "256"}
{"prefix_repetition_num_prefixes": "10"}
{"prefix_repetition_output_len": "128"}
{"method": "eagle"}
{"num_spec_tokens": "1"}
{"spec_token_tree": "None"}
{"spec_token_tree_depth": "None"}
{"spec_token_tree_branching": "None"}
{"prompt_lookup_max": "5"}
{"prompt_lookup_min": "2"}
{"tp": "1"}
{"enforce_eager": "False"}
{"enable_chunked_prefill": "False"}
{"temp": "0"}
{"top_p": "1.0"}
{"top_k": "-1"}
{"print_output": "False"}
{"max_num_seqs": "1"}
{"output_len": "256"}
{"model_dir": "None"}
{"eagle_dir": "None"}
{"custom_mm_prompts": "False"}
{"draft_vocab_frequency_path": "None"}
{"draft_vocab_frequency_keep_threshold": "None"}
{"compilation_config": "{\"level\": \"0\"}"}
{"endpoint_type": "openai-chat"}
Loading