Feature: Get the value for rotary base from the hugging face config, only for Qwen for now. #887

Gusanidas · 2025-02-27T20:16:36Z

Description

Two line change so that the rope_base parameter in the configuration is taken from the hugging face config if available.

When loading a HookedTransformer from a huggin face model in one of the known architectures, it may be the case that the rope base (also named rope theta) has a different value for the hugging_face model that the default for that architecture.

This is the case for example when loading "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" into the "Qwen 2.5" architecture.

The code already took the values for "d_vocab" and "load_in_4bit" from the hf config, I am extending it also to rope base.

I have limited it to Qwen models because otherwise many different test fail, but this is a change that will be usefull and accepted I can work on doing the general case, and adding the relevant tests, etc

Fixes # (issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
[x ] New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

Only for Qwen

Gusanidas and others added 2 commits February 27, 2025 19:02

Get the value for rotary base from the hugging face config

132df27

Only for Qwen

Merge branch 'dev' into feature/rope-theta-from-hf-config

6fac4c7

bryce13950 merged commit 50692c8 into TransformerLensOrg:dev Jun 10, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Get the value for rotary base from the hugging face config, only for Qwen for now. #887

Feature: Get the value for rotary base from the hugging face config, only for Qwen for now. #887

Uh oh!

Gusanidas commented Feb 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature: Get the value for rotary base from the hugging face config, only for Qwen for now. #887

Feature: Get the value for rotary base from the hugging face config, only for Qwen for now. #887

Uh oh!

Conversation

Gusanidas commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Screenshots

Checklist:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gusanidas commented Feb 27, 2025 •

edited

Loading