Add support for Qwen/Qwen3-0.6B-Base model #1075

mtaran · 2025-10-01T19:44:59Z

Summary

This PR adds support for the base (non-instruct) version of Qwen3-0.6B to TransformerLens.

Motivation

Currently, TransformerLens supports Qwen/Qwen3-0.6B (the instruct version), but not the base version Qwen/Qwen3-0.6B-Base. Since these models share the same architecture and only differ in weights, adding support is straightforward and requires only updating the model lists.

Changes

Added Qwen/Qwen3-0.6B-Base to OFFICIAL_MODEL_NAMES list
Added corresponding alias qwen3-0.6b-base to MODEL_ALIASES dictionary

Testing

Both models have been verified to:

Load successfully via HookedTransformer.from_pretrained()
Fetch different weights (as expected for base vs instruct models)

Test output:

Instruct model first embedding weight sum: -0.012408
Base model first embedding weight sum: -0.173449

Additional Context

Base model page: https://huggingface.co/Qwen/Qwen3-0.6B-Base
Instruct model page: https://huggingface.co/Qwen/Qwen3-0.6B
The base model is suitable for fine-tuning, while the instruct model is optimized for instruction-following

🤖 Generated with Claude Code

This commit adds support for the base (non-instruct) version of Qwen3-0.6B. The base model (Qwen/Qwen3-0.6B-Base) and instruct model (Qwen/Qwen3-0.6B) share the same architecture but have different weights. The base model is suitable for fine-tuning, while the instruct model is optimized for instruction-following and chat. Changes: - Added "Qwen/Qwen3-0.6B-Base" to OFFICIAL_MODEL_NAMES - Added alias "qwen3-0.6b-base" to MODEL_ALIASES 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Add Qwen/Qwen3-0.6B-Base to the free_compatible list in the Colab_Compatibility notebook to ensure all models in OFFICIAL_MODEL_NAMES are accounted for in the test suite. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Update the model count in Colab_Compatibility notebook output from 216 to 217 to reflect the addition of Qwen3-0.6B-Base. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

name and others added 3 commits October 1, 2025 12:44

Fix notebook output to reflect 217 models

5876095

Update the model count in Colab_Compatibility notebook output from 216 to 217 to reflect the addition of Qwen3-0.6B-Base. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for Qwen/Qwen3-0.6B-Base model #1075

Add support for Qwen/Qwen3-0.6B-Base model #1075

Uh oh!

mtaran commented Oct 1, 2025

Uh oh!

Uh oh!

Add support for Qwen/Qwen3-0.6B-Base model #1075

Are you sure you want to change the base?

Add support for Qwen/Qwen3-0.6B-Base model #1075

Uh oh!

Conversation

mtaran commented Oct 1, 2025

Summary

Motivation

Changes

Testing

Additional Context

Uh oh!

Uh oh!