Skip to content

Conversation

@jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Nov 27, 2025

✨ Description

  • Add HuggingfaceMultiModalModelForCausalLM wrapping multimodal models for hugging face, following the llava format.
  • Integrate the content of HuggingfaceBaseModelForCausalLM into HuggingfacePreTrainedModel and generalize to arbitrary inputs.
  • Rework output_hidden_states into an extensive debugging utility using the existing DebugLayer. When calling the model, one may "request" the model to output specific hidden states by providing a list of names in kwargs["output_hidden_states"] (output_hidden_states in hf wrapper). The matching hidden states (using regex) will be returned in kwargs["hidden_states"]. This is still experimental but already helped a lot with degugging. Ex:
    >>> model_fast_llm(test_input, pixel_values=pixels,output_hidden_states=["vision_encoder.encoder.0.mixer", "head.logits"])
    CausalLMOutputWithPast(loss=None, logits=tensor(...), past_key_values=[], hidden_states=
    {'vision_encoder.encoder.0.mixer.query_rotary_input': tensor(...),  'vision_encoder.encoder.0.mixer.key_rotary_input': tensor(...), 
    'vision_encoder.encoder.0.mixer.query': tensor(...), 'vision_encoder.encoder.0.mixer.key': tensor(...), 
    'vision_encoder.encoder.0.mixer.value': tensor(...),  'vision_encoder.encoder.0.mixer.context': tensor(...), 
    'vision_encoder.encoder.0.mixer': tensor(...), 'head.logits': tensor(...)}, attentions=None)
    
  • Replace the patch "convolution" by a simpler linear layer.
  • Add support for linear layers without input gradients (ex. vision embeddings)
  • Fix patch ordering in get_patches_from_images
  • Add missing causal and cross_document_attention in llava conversion.

@jlamypoirier jlamypoirier marked this pull request as ready for review November 28, 2025 01:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants