Skip to content

Conversation

mtaran
Copy link

@mtaran mtaran commented Oct 1, 2025

Summary

This PR adds support for the base (non-instruct) version of Qwen3-0.6B to TransformerLens.

Motivation

Currently, TransformerLens supports Qwen/Qwen3-0.6B (the instruct version), but not the base version Qwen/Qwen3-0.6B-Base. Since these models share the same architecture and only differ in weights, adding support is straightforward and requires only updating the model lists.

Changes

  • Added Qwen/Qwen3-0.6B-Base to OFFICIAL_MODEL_NAMES list
  • Added corresponding alias qwen3-0.6b-base to MODEL_ALIASES dictionary

Testing

Both models have been verified to:

  1. Load successfully via HookedTransformer.from_pretrained()
  2. Fetch different weights (as expected for base vs instruct models)

Test output:

Instruct model first embedding weight sum: -0.012408
Base model first embedding weight sum: -0.173449

Additional Context

🤖 Generated with Claude Code

name and others added 3 commits October 1, 2025 12:44
This commit adds support for the base (non-instruct) version of Qwen3-0.6B.
The base model (Qwen/Qwen3-0.6B-Base) and instruct model (Qwen/Qwen3-0.6B)
share the same architecture but have different weights. The base model is
suitable for fine-tuning, while the instruct model is optimized for
instruction-following and chat.

Changes:
- Added "Qwen/Qwen3-0.6B-Base" to OFFICIAL_MODEL_NAMES
- Added alias "qwen3-0.6b-base" to MODEL_ALIASES

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Add Qwen/Qwen3-0.6B-Base to the free_compatible list in the
Colab_Compatibility notebook to ensure all models in OFFICIAL_MODEL_NAMES
are accounted for in the test suite.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Update the model count in Colab_Compatibility notebook output
from 216 to 217 to reflect the addition of Qwen3-0.6B-Base.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant