fix candidate_input_ids device before running SD validation #4

jmamou · 2024-12-04T13:11:03Z

if draft and target are on different devices, without the fix, we get an error when running _speculative_sampling

q_i = q[:, torch.arange(candidate_length), new_candidate_input_ids].squeeze(0, 1)
since q and new_candidate_input_ids are not on the same device.

jmamou · 2024-12-04T13:16:12Z

@keyboardAnt note that the issue occurs also with regular SD

* Fix converter * [Broken] Adds Gemma 3 to Hugging Face Transformers * Consolidating Config and Processor params across impls * Sorting out configuration parameters. Adds qk_norm before RoPE. Still not sure if RoPE is right. * Additional plumbing for CausalLM and ConditionalGeneration variants * incomplete draft of Orbax conversion script * More complete checkpoint conversion * Supporting Gemma 3 1B checkpoints * Updating RoPE for multiple frequencies * Adjustments to rotary embedder * Proof of life for text-only operation * Updating the conversion script to handle multimodal projection weights * Fixing tet-only conversions * Cleaner conversion script with multimodal support and a simpler processor * Additional refatcors to the Gemma3Processor * Simplified Processor to work over text representations * Updated conversion script to join text and vision embeddings at converion time * Logging for debugging * Update src/transformers/models/gemma2/modeling_gemma2.py Co-authored-by: Joshua Lochner <[email protected]> * Removed extraneous Config params * Switching to fast tokenizer for checkpoint conversions * isolating siglip for performance tetsing * Minor changes for debugging tests against baselines * Adding average pooling for soft tokens * Updating processor code to enable simpler embedding interleaving for arbitrary number of images in prompts * Updating conversion script for ShieldGemma 2 conversion compatibility * Allow disable_compile to be provided as a kwarg * Refresh from modular * Updated conversion script and corrected sliding window * Fix type mismatch in cache_position (#4) * Fix dtype (#5) * Fix type mismatch in cache_position * Actually fix in the modular file Co-authored-by: Aritra Roy Gosthipaty <[email protected]> --------- Co-authored-by: Aritra Roy Gosthipaty <[email protected]> * fixes for embedding table overflow and missing image_soft_token_mask from Gemma3Processor * Adding 2D pooling for image embeddings * Revert "Adding 2D pooling for image embeddings" This reverts commit 65350cf. * Gemma3 average pooling changed from 1D to 2D * Major refactor to Gemma3MultimodalInputProjection * Updating Gemm 3 Auto* registrations * Add option to save Gemma 3 chat template with tokenizer during weights conversion * Removing unused imports * Moving out-of-vocab handling from Gemma3Processor to Gemma3ForConditionalGeneration * Removing duplicate config property * Removing final logit softcapping and 1-indexing of position ids * Fixing image processor config and none --> None typo * Fixing sliding window size for 1B * Updating image_mean and image_std in Image Processor * Attention masking changed to lower triangular * Moving image special tokens to conversion script * Mirror image processor defaults from conversion script into Gemma3ProcessorKwargs * Remove special token variables from symbol space * Moving image soft token mask computation from Gemma3Processor to Gemma3ForConditionalGeneration * tie lm_head and embedding weights Co-authored-by: Matthew Douglas <[email protected]> * Correct tied weights in Gemma3CausalLM * iterative bidirectional attention * resolving merge conflicts * Reverting to Gemma 2 HybridCache with sldiing window support and a sliding_window_pattern of 6 * Correcting RoPE scaling * clean up first pass, dummy model geenration works * final clean up before fixing tests * causal lm test works, so fine * Fix conversion * Update src/transformers/models/gemma3/processing_gemma3.py * model tests are happy * processor tests are happy * image processing tests added * fixup * Fix pre-processing in conversion * Inputs merging * Do not normalize vision embeddings * Apply Ryan's (and team) changes to attention * token type ids + mask * template * move embed scale, add rope scale, fix tests * Add chat template to tokenizer * Use prefix for causal model loading * use existing code for sliding mask from gemma2 * self.embed_tokens already normalizes * Correcting Gemma3TextConfig parameters in conversion script * typo, modular overwrites my fixes * enable device map for text model * Conversion updates * ultra nit: no einsums * update image token * copy deepcopy config + some docs * add some test, still WIP * Refactoring --include_chat_tempalte logic in converter * Update src/transformers/models/gemma3/modular_gemma3.py Co-authored-by: Xuan-Son Nguyen <[email protected]> * Add eos tokens for instruct models * dump so i can work on dgx * Removing add_bos by default * dump * add fast im proc * docs for PaS + fixup * another fixup * one more fixup * fix tests * Inverting prior BOS change * ultra nit * Reverting to Tokenizer saved with add_bos_token=True and chat template starting with BOS * resize embeds, remove sqrt, add slow test outputs * FA2 but quality is meh * nit * skip FA2, no idea what happened * last bit for green CI * please, green CI for docs * T_T * Fix for Gemma3 logits * Support both options for system prompt * Update src/transformers/models/gemma3/image_processing_gemma3_fast.py Co-authored-by: Pedro Cuenca <[email protected]> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <[email protected]> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <[email protected]> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <[email protected]> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <[email protected]> * Update docs/source/en/model_doc/gemma3.md Co-authored-by: Pedro Cuenca <[email protected]> * Docs updates now that assets are live * Style fixes --------- Co-authored-by: Joshua Lochner <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Aritra Roy Gosthipaty <[email protected]> Co-authored-by: Mayank Chaturvedi <[email protected]> Co-authored-by: Matthew Douglas <[email protected]> Co-authored-by: raushan <[email protected]> Co-authored-by: Raushan Turganbay <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]> Co-authored-by: Lysandre <[email protected]>

jmamou added 2 commits December 4, 2024 05:04

fix candidate_input_ids device

e40c775

minor

b5ce873

keyboardAnt approved these changes Dec 4, 2024

View reviewed changes

keyboardAnt merged commit bfccdea into usd Dec 5, 2024

keyboardAnt deleted the fix_device branch December 5, 2024 01:27

jmamou mentioned this pull request Dec 5, 2024

Refactoring AssistedCandidateGenerator for Improved Modularity and Reusability huggingface/transformers#35009

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix candidate_input_ids device before running SD validation #4

fix candidate_input_ids device before running SD validation #4

Uh oh!

jmamou commented Dec 4, 2024

Uh oh!

jmamou commented Dec 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix candidate_input_ids device before running SD validation #4

fix candidate_input_ids device before running SD validation #4

Uh oh!

Conversation

jmamou commented Dec 4, 2024

Uh oh!

jmamou commented Dec 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants