Fix minor review comments #16

gauravjain14 · 2025-01-24T19:48:15Z

@keyboardAnt - these address the comments on the PR huggingface#35029

keyboardAnt · 2025-01-24T20:09:40Z

Thanks @gauravjain14. It seems like this PR addresses only a subset of the issues raised in the reviews. Can you please address the remaining concerns as well and run the benchmark script to ensure there are no regressions?

gauravjain14 · 2025-01-24T21:00:15Z

@keyboardAnt - I have addressed all of the recommended fixes except this - huggingface#35029 (comment)

Comments which I haven't addressed are the ones Jon have already provided justifications to and I went through them and they look good to me.

Let me know if you think I have missed any in particular.

I'll run the benchmarking script.

gauravjain14 · 2025-01-25T09:43:21Z

Verified functionality with the benchmarking script in nadav/benchmark branch by running python benchmark.py in transformers/benchmark_usd

keyboardAnt · 2025-01-25T20:52:36Z

@gauravjain14, thanks for confirming the functionality. Could you please share the benchmark results as well? I’d like to ensure that the new changes don’t introduce any slowdowns compared to the current version.

keyboardAnt

@gauravjain14 Please add all the details regarding the comparison between the commits, including what you have run and the results. Then feel free to squash-merge. Thanks!

gauravjain14 · 2025-01-27T16:43:53Z

Results from benchmark.py (from nadav/benchmark and analyzed using usd/analyze.py from https://github.com/keyboardAnt/speculative-decoding-any-vocab -
Base model - llama-3.1-8b-instruct, Dataset - cnn dailymail, num_of_examples - 20

Harmonic Mean:
Llama 1B TPOT 0.054
Llama 3B TPOT 0.069
Qwen USD TPOT 0.086
Baseline TPOT 0.095
Qwen UAG TPOT 0.10

* move `TestAssistedCandidateGeneratorDifferentTokenizers` into a new testing file * refactor * NOTHING. add space to rerun github actions tests * remove it... * `UniversalSpeculativeDecodingGenerator` * Use `UniversalSpeculativeDecodingGenerator` when `generation_config.do_sample=True` * assistant tokenizes only the target's new suffix * formatting * fix code * fix code * formatting * add `TestGenerateWithDifferentModels` * `TestGenerateWithDifferentModels` parameterize on `do_sample` * `AssistantVocabMapping` & `AssistantVocabMappingCache` * formatting * `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits` * improve `_get_assistant_to_target_input_ids` & formatting * renaming * WIP: debugging `min_new_tokens` * fix get_target_ids * `UniversalSpeculativeDecodingGenerator` * assistant tokenizes only the target's new suffix * formatting * fix code * fix code * formatting * `TestGenerateWithDifferentModels` parameterize on `do_sample` * `AssistantVocabMapping` & `AssistantVocabMappingCache` * formatting * `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits` * improve `_get_assistant_to_target_input_ids` & formatting * renaming * WIP: debugging `min_new_tokens` * fix get_target_ids * fix device issue * fix get_assistant_input_ids * add `TestAssistedCandidateGeneratorDifferentTokenizers` * formatting * `AssistantVocabTranslatorCache` refactor & tests * revert changes in `src/transformers/generation/logits_process.py` * refactor `AssistedCandidateGenerator` * refactor `AssistedCandidateGeneratorDifferentTokenizers` * formatting * refactor `UniversalSpeculativeDecodingGenerator` * fix negative value for max_new_tokens * fix generation length target + attention_mask vs. assistant + attent * fix device * fix negative max_new_tokens bug * fix UAG * minor * formatting * `AssistedCandidateGeneratorDifferentTokenizers` `lookbehind`s init * resolve conflict & formatting * rerun CI tests * remove space... * remove old code * fix candidate_input_ids device * minor * formatting * Fix prepare + apply (#7) * fix prepare + apply * move to cpu * simplity suppress_tokens * fix bugs and refacatoring * device move * handle self.config.vocab_size > len(target_tokenizer.get_vocab()) * no need to normalize in candidate_generator * address Nadav's comments + minor * optimize device move + SuppressTokensLogitsProcessor * AssistantToTargetTranslator, SuppressTokensLogitsProcessor and tokenizers mapping improvements * padding size * padding improvement * fix and simplify get_target_logits * renaming in get_target_logits * minor * add filter_value and suppress_tokens_id * style + rename * remove TODO * restore original SelectTokensLogitsProcessor with modification * fix style * fix _update_past_and_masks and optimize code * remove assistant_vocab_size arg * fix attention_mask * call _prepare_attention_mask also if not has_past_key_values * handling attention mask for first generation * comment * restore test * remove SelectTokensLogitsProcessor * _update_past_and_masks implementation for USD * Add unittests for Universal Assisted generation * fix style * update tests * Remove unused import and fix `test_speculation_depth` test * exclude special and reserved tokens from tokenizer for UAG * mv `test_universal_assisted_generation.py` to `generation/test_candidate_generator.py` * Remove unused imports and fix style using `make style` (#9) * formatting * Swap gated `meta-llama/llama-3.2` with `allenai/llama` (#10) * Fix space sign disagreement (#12) * default values for AssistantToTargetTranslator fileds * fix space sign * minor * fix test + style * Default values for some fields of assistant to target translator (#11) * default values for AssistantToTargetTranslator fileds * fix * add support to empty logit_processors * Update candidate_generator.py (#15) fix typo * BUG fix in _prepare_assistant_input_ids (#14) * fix _prepare_assistant_input_ids * target_to_assistant_input_ids * Update src/transformers/generation/candidate_generator.py Co-authored-by: Nadav Timor <[email protected]> --------- Co-authored-by: Nadav Timor <[email protected]> * typo (`target_to_assistant_input_ids`) * formatting * merge upstream/main * Fix minor review comments (#16) * Fix: `token_ids.to(torch.int64)` (#18) * tok ids to `torch.int64` (reference: https://huggingface.co/docs/transformers.js/en/api/tokenizers) * `LongTensor` * fix dtype * `assistant_input_ids.to(dtype=torch.long)` * Remove unused import from test_candidate_generator.py * Remove unused import from test_candidate_generator.py * Remove `numpy` import * resolve pr comments (#19) * `AssistantToTargetTranslator` docstring * (per gante's comment) `filter_value` and `suppress_tokens_id` to class constants * update `AssistantToTargetTranslator` docstring * (gante's comment) replace `match-case` * formatting * Fix Joao's comments (#21) * remove threading * fix logits_processor * fix test device * fix style (#23) * Move atm (#24) * move AssistantToTargetTranslator * fixup * fix logit_processor * add atm_translator test * refactor test * remove threading from test * add require_torch in tests * move AssistantVocabTranslatorCache + add tests * ruff fix --------- Co-authored-by: jmamou <[email protected]> Co-authored-by: Gaurav <[email protected]> Co-authored-by: Gaurav Jain <[email protected]> Co-authored-by: gauravjain14 <[email protected]>

Fix minor review comments

b742d69

keyboardAnt self-requested a review January 27, 2025 14:03

keyboardAnt approved these changes Jan 27, 2025

View reviewed changes

gauravjain14 merged commit 4e3660a into usd Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix minor review comments #16

Fix minor review comments #16

Uh oh!

gauravjain14 commented Jan 24, 2025

Uh oh!

keyboardAnt commented Jan 24, 2025

Uh oh!

gauravjain14 commented Jan 24, 2025 •

edited

Loading

Uh oh!

gauravjain14 commented Jan 25, 2025

Uh oh!

keyboardAnt commented Jan 25, 2025

Uh oh!

keyboardAnt left a comment •

edited

Loading

Uh oh!

gauravjain14 commented Jan 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix minor review comments #16

Fix minor review comments #16

Uh oh!

Conversation

gauravjain14 commented Jan 24, 2025

Uh oh!

keyboardAnt commented Jan 24, 2025

Uh oh!

gauravjain14 commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gauravjain14 commented Jan 25, 2025

Uh oh!

keyboardAnt commented Jan 25, 2025

Uh oh!

keyboardAnt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gauravjain14 commented Jan 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gauravjain14 commented Jan 24, 2025 •

edited

Loading

keyboardAnt left a comment •

edited

Loading