Skip to content

Conversation

@AshAnand34
Copy link

@AshAnand34 AshAnand34 commented May 10, 2025

Description

This pull request introduces support for the new SmolVLM2 model, a lightweight vision-language model (see #197). It includes updates to the documentation, CLI, core model implementation, and additional utilities for training, inference, and object detection. Below is a summary of the most important changes grouped by theme.

CLI Enhancements

  • Registered SmolVLM2 commands (info, predict, and train) in the CLI via maestro/cli/introspection.py and maestro/trainer/models/smolvlm2/entrypoint.py. These commands enable fine-tuning, inference, and model information retrieval directly from the command line.

Core Model Implementation

  • Added maestro/trainer/models/smolvlm2/core.py, which includes the SmolVLM2Core class for model initialization, input processing, text generation, and training. It supports optimization strategies like QLoRA, LoRA, and freezing the vision encoder.

Utility Functions

  • Introduced checkpoint utilities in maestro/trainer/models/smolvlm2/checkpoints.py for saving and loading model checkpoints, including metadata.
  • Added maestro/trainer/models/smolvlm2/detection.py for converting SmolVLM2 text outputs into object detection formats and vice versa, as well as formatting prompts for detection tasks.

Inference and Entrypoint

  • Implemented SmolVLM2Inference and integrated it into the main entrypoint in maestro/trainer/models/smolvlm2/entrypoint.py, enabling flexible inference workflows via both CLI and Python.

List any dependencies that are required for this change.

  • "accelerate>=1.2.1",
  • "peft>=0.12",
  • "torch>=2.4.0",
  • "torchvision>=0.20.0",
  • "transformers>=4.49.0",
  • "bitsandbytes>=0.45.0"

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

Testing in progress

Docs

  • Docs updated? What were the changes:
  • Added SmolVLM2-specific installation, training (CLI and Python), and inference instructions to docs/index.md.
  • Created a dedicated docs/models/smolvlm2.md file with an overview, installation steps, training options, inference examples, and object detection capabilities.

@CLAassistant
Copy link

CLAassistant commented May 10, 2025

CLA assistant check
All committers have signed the CLA.

@bonninr
Copy link

bonninr commented May 16, 2025

Voting +1 for this feature being reviewed.

@SkalskiP
Copy link
Collaborator

SkalskiP commented Jun 6, 2025

Hi @AshAnand34, thanks a lot for this PR! SmolVLM and SmolVLM2 have been on our radar for a while now. However, after a deeper review, we realized that your PR doesn’t fully align with the conventions we follow in the maestro repository. For this reason, I asked @AlexBodner to build on top of your code and make the necessary adjustments.

As a result, I’m closing this PR. You can find the updated version here: #207. It includes both your commits and those from @AlexBodner, so your contribution history remains intact.

@SkalskiP SkalskiP closed this Jun 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants