Fix space sign disagreement #12

jmamou · 2025-01-07T13:26:43Z

In BPE tokenizers, all tokens that occur at the beginning of a word, are stored twice as 'token' and ' token'.
Different tokenizers may use different special characters to represent space and we did not take it into account in _get_assistant_to_target_input_ids. Overlap will increase significantly.
The bug has an impact on the target/draft pairs we reported slowdown.

keyboardAnt

To address the issue we discussed on Monday regarding counting the size of the intersection between the vocabularies, I propose the following method instead of replacing specific characters like spaces. For any draft token d, decode it into a string D(d), and tokenize this string into the target vocabulary T(D(d)). If T(D(d)) yields a single target token t, then d and t are considered equal. Otherwise, there is no 1:1 mapping.

jmamou · 2025-01-08T13:39:54Z

To address the issue we discussed on Monday regarding counting the size of the intersection between the vocabularies, I propose the following method instead of replacing specific characters like spaces. For any draft token d, decode it into a string D(d), and tokenize this string into the target vocabulary T(D(d)). If T(D(d)) yields a single target token t, then d and t are considered equal. Otherwise, there is no 1:1 mapping.

Your proposition assumes that if an assistant token can be tokenized either as a single token or as multiple tokens by the target tokenizer, it will be tokenized as a single token.
Can we ensure it?

keyboardAnt

We discussed your concern about my proposal over the phone:

Tokenizers might treat the tokens at the beginning of the tokenized string differently. For example, BPE tokenizers may add a BOS token. If the target tokenizer T is a BPE tokenizer, then we might have:

T(s + D(d)) != T(s) + T(D(d))

for some nonempty string s, where + denotes string concatenation.

Therefore, we agreed to move forward with the solution that replaces the space symbol directly.

The only minor issue is the need to rerun the formatting required by Hugging Face. Running make style should suffice.

keyboardAnt

LGTM. Please squash-merge.

* gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update readme Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * gptqmodel need use checkpoint_format (#1) * gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * Revert quantizer_gptq.py (#2) * revert quantizer_gptq.py change * pass **kwargs * limit gptqmodel and optimum version Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix warning Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix version check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert unrelated changes Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable gptqmodel tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix requires gptq Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Fix Transformer compat (#3) * revert quantizer_gptq.py change * pass **kwargs * add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format again Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update gptqmodel version (#6) * update gptqmodel version * update gptqmodel version * fix unit test (#5) * update gptqmodel version * update gptqmodel version * "not self.use_exllama" is not equivalent to "self.use_exllama==False" * fix unittest * update gptqmodel version * backend is loading_attibutes (#7) * fix format and tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix memory check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix device mismatch Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix result check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * update tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * review: update docs (#10) * review: update docs (#12) * review: update docs * fix typo * update tests for gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update document (#9) * update overview.md * cleanup * Update overview.md * Update overview.md * Update overview.md * update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md --------- Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * typo * doc note for asymmetric quant * typo with apple silicon(e) * typo for marlin * column name revert: review * doc rocm support * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Resolve vptq conflict * Rename spqr package to spqr_quant * Get rid of aqlm mention * Start working on tests * Resolve ruff code checks * Ruff format * Isort * Test updates * Add gpu tag * Rename to modules_to_not_convert * Config update * Docs and config update * Docs and config update * Update to update_torch_dtype * spqr config parameter validation * Ruff update * Apply ruff fixes * Test fixes * Ruff update * Mark tests as @slow again; Ruff; Docstring update * Ruff * Remove absolute path * Resolve typo * Remove redundandt log * Check accelerate/spqr availability * Ruff fix * Check if the config contains proper shapes * Ruff test * Documentation update * overview update * Ruff checks * Ruff code quality * Make style * Update docs/source/en/quantization/spqr.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update spqr.md * Enable gptqmodel (huggingface#35012) * gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update readme Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * gptqmodel need use checkpoint_format (#1) * gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * Revert quantizer_gptq.py (#2) * revert quantizer_gptq.py change * pass **kwargs * limit gptqmodel and optimum version Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix warning Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix version check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * revert unrelated changes Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * enable gptqmodel tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix requires gptq Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Fix Transformer compat (#3) * revert quantizer_gptq.py change * pass **kwargs * add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * fix format Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix format again Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update gptqmodel version (#6) * update gptqmodel version * update gptqmodel version * fix unit test (#5) * update gptqmodel version * update gptqmodel version * "not self.use_exllama" is not equivalent to "self.use_exllama==False" * fix unittest * update gptqmodel version * backend is loading_attibutes (#7) * fix format and tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix memory check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix device mismatch Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix result check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * update tests Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * review: update docs (#10) * review: update docs (#12) * review: update docs * fix typo * update tests for gptqmodel Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update document (#9) * update overview.md * cleanup * Update overview.md * Update overview.md * Update overview.md * update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md --------- Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> * typo * doc note for asymmetric quant * typo with apple silicon(e) * typo for marlin * column name revert: review * doc rocm support * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix : Nemotron Processor in GGUF conversion (huggingface#35708) * fixing nemotron processor * make style * Update docs/source/en/quantization/spqr.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add missing TOC to doc --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com> Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai> Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com> Co-authored-by: LRL <lrl@lbx.dev> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* move `TestAssistedCandidateGeneratorDifferentTokenizers` into a new testing file * refactor * NOTHING. add space to rerun github actions tests * remove it... * `UniversalSpeculativeDecodingGenerator` * Use `UniversalSpeculativeDecodingGenerator` when `generation_config.do_sample=True` * assistant tokenizes only the target's new suffix * formatting * fix code * fix code * formatting * add `TestGenerateWithDifferentModels` * `TestGenerateWithDifferentModels` parameterize on `do_sample` * `AssistantVocabMapping` & `AssistantVocabMappingCache` * formatting * `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits` * improve `_get_assistant_to_target_input_ids` & formatting * renaming * WIP: debugging `min_new_tokens` * fix get_target_ids * `UniversalSpeculativeDecodingGenerator` * assistant tokenizes only the target's new suffix * formatting * fix code * fix code * formatting * `TestGenerateWithDifferentModels` parameterize on `do_sample` * `AssistantVocabMapping` & `AssistantVocabMappingCache` * formatting * `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits` * improve `_get_assistant_to_target_input_ids` & formatting * renaming * WIP: debugging `min_new_tokens` * fix get_target_ids * fix device issue * fix get_assistant_input_ids * add `TestAssistedCandidateGeneratorDifferentTokenizers` * formatting * `AssistantVocabTranslatorCache` refactor & tests * revert changes in `src/transformers/generation/logits_process.py` * refactor `AssistedCandidateGenerator` * refactor `AssistedCandidateGeneratorDifferentTokenizers` * formatting * refactor `UniversalSpeculativeDecodingGenerator` * fix negative value for max_new_tokens * fix generation length target + attention_mask vs. assistant + attent * fix device * fix negative max_new_tokens bug * fix UAG * minor * formatting * `AssistedCandidateGeneratorDifferentTokenizers` `lookbehind`s init * resolve conflict & formatting * rerun CI tests * remove space... * remove old code * fix candidate_input_ids device * minor * formatting * Fix prepare + apply (#7) * fix prepare + apply * move to cpu * simplity suppress_tokens * fix bugs and refacatoring * device move * handle self.config.vocab_size > len(target_tokenizer.get_vocab()) * no need to normalize in candidate_generator * address Nadav's comments + minor * optimize device move + SuppressTokensLogitsProcessor * AssistantToTargetTranslator, SuppressTokensLogitsProcessor and tokenizers mapping improvements * padding size * padding improvement * fix and simplify get_target_logits * renaming in get_target_logits * minor * add filter_value and suppress_tokens_id * style + rename * remove TODO * restore original SelectTokensLogitsProcessor with modification * fix style * fix _update_past_and_masks and optimize code * remove assistant_vocab_size arg * fix attention_mask * call _prepare_attention_mask also if not has_past_key_values * handling attention mask for first generation * comment * restore test * remove SelectTokensLogitsProcessor * _update_past_and_masks implementation for USD * Add unittests for Universal Assisted generation * fix style * update tests * Remove unused import and fix `test_speculation_depth` test * exclude special and reserved tokens from tokenizer for UAG * mv `test_universal_assisted_generation.py` to `generation/test_candidate_generator.py` * Remove unused imports and fix style using `make style` (#9) * formatting * Swap gated `meta-llama/llama-3.2` with `allenai/llama` (#10) * Fix space sign disagreement (#12) * default values for AssistantToTargetTranslator fileds * fix space sign * minor * fix test + style * Default values for some fields of assistant to target translator (#11) * default values for AssistantToTargetTranslator fileds * fix * add support to empty logit_processors * Update candidate_generator.py (#15) fix typo * BUG fix in _prepare_assistant_input_ids (#14) * fix _prepare_assistant_input_ids * target_to_assistant_input_ids * Update src/transformers/generation/candidate_generator.py Co-authored-by: Nadav Timor <nadav.timor@weizmann.ac.il> --------- Co-authored-by: Nadav Timor <nadav.timor@weizmann.ac.il> * typo (`target_to_assistant_input_ids`) * formatting * merge upstream/main * Fix minor review comments (#16) * Fix: `token_ids.to(torch.int64)` (#18) * tok ids to `torch.int64` (reference: https://huggingface.co/docs/transformers.js/en/api/tokenizers) * `LongTensor` * fix dtype * `assistant_input_ids.to(dtype=torch.long)` * Remove unused import from test_candidate_generator.py * Remove unused import from test_candidate_generator.py * Remove `numpy` import * resolve pr comments (#19) * `AssistantToTargetTranslator` docstring * (per gante's comment) `filter_value` and `suppress_tokens_id` to class constants * update `AssistantToTargetTranslator` docstring * (gante's comment) replace `match-case` * formatting * Fix Joao's comments (#21) * remove threading * fix logits_processor * fix test device * fix style (#23) * Move atm (#24) * move AssistantToTargetTranslator * fixup * fix logit_processor * add atm_translator test * refactor test * remove threading from test * add require_torch in tests * move AssistantVocabTranslatorCache + add tests * ruff fix --------- Co-authored-by: jmamou <jonathan.mamou@intel.com> Co-authored-by: Gaurav <gauravj@d-matrix.ai> Co-authored-by: Gaurav Jain <gaurjain14@gmail.com> Co-authored-by: gauravjain14 <41287729+gauravjain14@users.noreply.github.com>

jmamou added 2 commits January 7, 2025 05:23

default values for AssistantToTargetTranslator fileds

172b3cd

fix space sign

1b8dec7

jmamou requested review from gauravjain14 and keyboardAnt January 7, 2025 13:33

minor

7fcb49e

keyboardAnt requested changes Jan 8, 2025

View reviewed changes

keyboardAnt approved these changes Jan 9, 2025

View reviewed changes

fix test + style

e10c4b7

jmamou requested a review from keyboardAnt January 9, 2025 11:53

keyboardAnt approved these changes Jan 9, 2025

View reviewed changes

jmamou merged commit 25cd5da into usd Jan 9, 2025

jmamou deleted the fix_space branch January 9, 2025 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix space sign disagreement #12

Fix space sign disagreement #12

Uh oh!

jmamou commented Jan 7, 2025

Uh oh!

keyboardAnt left a comment

Uh oh!

jmamou commented Jan 8, 2025

Uh oh!

keyboardAnt left a comment •

edited

Loading

Uh oh!

keyboardAnt left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix space sign disagreement #12

Fix space sign disagreement #12

Uh oh!

Conversation

jmamou commented Jan 7, 2025

Uh oh!

keyboardAnt left a comment

Choose a reason for hiding this comment

Uh oh!

jmamou commented Jan 8, 2025

Uh oh!

keyboardAnt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keyboardAnt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

keyboardAnt left a comment •

edited

Loading

keyboardAnt left a comment •

edited

Loading