Skip to content

Override HF config.json via CLI#3722

Merged
lvhan028 merged 11 commits into
InternLM:mainfrom
CUHKSZzxy:hf-config-overrides
Jul 15, 2025
Merged

Override HF config.json via CLI#3722
lvhan028 merged 11 commits into
InternLM:mainfrom
CUHKSZzxy:hf-config-overrides

Conversation

@CUHKSZzxy
Copy link
Copy Markdown
Collaborator

@CUHKSZzxy CUHKSZzxy commented Jul 9, 2025

Objective

Allows users to change the HF config.json via CLI, which refers to

TODO

  • Support TM backend
  • OpenCompass evaluations

Usage

For example, dynamic rope scaling from CLI, either passes inline JSON

lmdeploy serve api_server \
        ${model_path} \
        --hf-overrides '{"rope_scaling": {"rope_type": "yarn", "factor": 4.0, "original_max_position_embeddings": 32768}}' \
        --log-level INFO

or passes arbitrary JSON keys individually

lmdeploy serve api_server \
        ${model_path} \
        --tp 2 \
        --hf-overrides.rope_scaling.rope_type "yarn" \
        --hf-overrides.rope_scaling.factor 4.0 \
        --hf-overrides.rope_scaling.original_max_position_embeddings 32768 \
        --log-level INFO

OC

  • Qwen/Qwen3-30B-A3B, PT, {"rope_type": "yarn", "factor": 4.0, "original_max_position_embeddings": 32768}
image
  • Qwen/Qwen3-30B-A3B, TM, {"rope_type": "yarn", "factor": 4.0, "original_max_position_embeddings": 32768}
image
  • Qwen/Qwen3-30B-A3B, PT, default rope scaling
image

BUG Fix

Modifications to lmdeploy/turbomind/turbomind.py.

When using OpenCompass to pass in nested dict parameters, they will be wrapped as mmegine ConfigDict type. If we use the lmdeploy TurboMind backend, the following errors will happen in yaml.safe_dump:

Error Trace
[2025-07-10 14:51:40] 07/10 14:51:40 - OpenCompass - INFO - Task [qwen3-30b-a3b/LongBenchv2_6,qwen3-30b-a3b/Length5000Depth31_2needle_en_8k,qwen3-30b-a3b/Length5000Depth73_2needle_en_8k,qwen3-30b-a3b/Length6000Depth10_2needle_en_8k,qwen3-30b-a3b/Length6000Depth52_2needle_en_8k,qwen3-30b-a3b/Length6000Depth94_2needle_en_8k,qwen3-30b-a3b/Length7000Depth31_2needle_en_8k,qwen3-30b-a3b/Length7000Depth73_2needle_en_8k,qwen3-30b-a3b/Length8000Depth10_2needle_en_8k,qwen3-30b-a3b/Length8000Depth52_2needle_en_8k,qwen3-30b-a3b/Length8000Depth94_2needle_en_8k,qwen3-30b-a3b/Length5000Depth31_3needle_en_8k,qwen3-30b-a3b/Length5000Depth73_-a3b/Length7000Depth31_origin_en_8k,qwen3-30b-a3b/Length7000Depth73_origin_en_8k,qwen3-30b-a3b/Length8000Depth...
[2025-07-10 14:51:43] 2025-07-10 14:51:43,545 - lmdeploy - �[33mWARNING�[0m - config.py:170 - Overriding HF config with {'rope_scaling': {'factor': 4.0, 'original_max_position_embeddings': 32768, 'rope_type': 'yarn'}}
[2025-07-10 14:51:43] Traceback (most recent call last):
[2025-07-10 14:51:43]   File "/opencompass@f1e50d4/opencompass/tasks/openicl_infer.py", line 161, in <module>
[2025-07-10 14:51:43]     inferencer.run()
[2025-07-10 14:51:43]   File "/opencompass@f1e50d4/opencompass/tasks/openicl_infer.py", line 73, in run
[2025-07-10 14:51:43]     self.model = build_model_from_cfg(model_cfg)
[2025-07-10 14:51:43]   File "/opencompass@f1e50d4/opencompass/utils/build.py", line 24, in build_model_from_cfg
[2025-07-10 14:51:43]     return MODELS.build(model_cfg)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
[2025-07-10 14:51:43]     return self.build_func(cfg, *args, **kwargs, registry=self)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
[2025-07-10 14:51:43]     obj = obj_cls(**args)  # type: ignore
[2025-07-10 14:51:43]   File "/opencompass@f1e50d4/opencompass/models/turbomind_with_tf_above_v4_33.py", line 57, in __init__
[2025-07-10 14:51:43]     self.pipe = self._build_pipe(path, backend, _engine_config)
[2025-07-10 14:51:43]   File "/opencompass@f1e50d4/opencompass/models/turbomind_with_tf_above_v4_33.py", line 203, in _build_pipe
[2025-07-10 14:51:43]     return pipeline(model_path, backend_config=backend_config, log_level='WARNING')
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/lmdeploy/api.py", line 83, in pipeline
[2025-07-10 14:51:43]     return pipeline_class(model_path,
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 281, in __init__
[2025-07-10 14:51:43]     self._build_turbomind(model_path=model_path, backend_config=backend_config, **kwargs)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 332, in _build_turbomind
[2025-07-10 14:51:43]     self.engine = tm.TurboMind.from_pretrained(model_path,
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 381, in from_pretrained
[2025-07-10 14:51:43]     return cls(model_path=pretrained_model_name_or_path,
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 154, in __init__
[2025-07-10 14:51:43]     self.model_comm = self._from_hf(model_source=model_source,
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/lmdeploy/turbomind/turbomind.py", line 276, in _from_hf
[2025-07-10 14:51:43]     config=yaml.safe_dump(self.config_dict),
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/yaml/__init__.py", line 269, in safe_dump
[2025-07-10 14:51:43]     return dump_all([data], stream, Dumper=SafeDumper, **kwds)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/yaml/__init__.py", line 241, in dump_all
[2025-07-10 14:51:43]     dumper.represent(data)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/yaml/representer.py", line 27, in represent
[2025-07-10 14:51:43]     node = self.represent_data(data)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/yaml/representer.py", line 48, in represent_data
[2025-07-10 14:51:43]     node = self.yaml_representers[data_types[0]](self, data)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/yaml/representer.py", line 207, in represent_dict
[2025-07-10 14:51:43]     return self.represent_mapping('tag:yaml.org,2002:map', data)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/yaml/representer.py", line 118, in represent_mapping
[2025-07-10 14:51:43]     node_value = self.represent_data(item_value)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/yaml/representer.py", line 48, in represent_data
[2025-07-10 14:51:43]     node = self.yaml_representers[data_types[0]](self, data)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/yaml/representer.py", line 207, in represent_dict
[2025-07-10 14:51:43]     return self.represent_mapping('tag:yaml.org,2002:map', data)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/yaml/representer.py", line 118, in represent_mapping
[2025-07-10 14:51:43]     node_value = self.represent_data(item_value)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/yaml/representer.py", line 48, in represent_data
[2025-07-10 14:51:43]     node = self.yaml_representers[data_types[0]](self, data)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/yaml/representer.py", line 207, in represent_dict
[2025-07-10 14:51:43]     return self.represent_mapping('tag:yaml.org,2002:map', data)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/yaml/representer.py", line 118, in represent_mapping
[2025-07-10 14:51:43]     node_value = self.represent_data(item_value)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/yaml/representer.py", line 58, in represent_data
[2025-07-10 14:51:43]     node = self.yaml_representers[None](self, data)
[2025-07-10 14:51:43]   File "/oc-v038-ld-v063whl/lib/python3.10/site-packages/yaml/representer.py", line 231, in represent_undefined
[2025-07-10 14:51:43]     raise RepresenterError("cannot represent an object", data)
[2025-07-10 14:51:43] yaml.representer.RepresenterError: ('cannot represent an object', {'factor': 4.0, 'original_max_position_embeddings': 32768, 'rope_type': 'yarn'})
[2025-07-10 14:51:44] 
Sample Code to Reproduce
import yaml
import json
from mmengine.config import ConfigDict

# 1. Define a config using `mmengine.ConfigDict` to mimic OpenCompass.
# The nested dictionary for `hf_overrides` also becomes a `ConfigDict`.
problematic_config = ConfigDict(
    hf_overrides={
        "rope_scaling": {
            "rope_type": "yarn",
            "factor": 4.0,
            "original_max_position_embeddings": 32768
        }
    })

# 2. Attempt to dump the problematic config. This will fail.
print("Attempting to dump the raw ConfigDict...")
try:
    yaml.safe_dump(problematic_config)
except yaml.representer.RepresenterError as e:
    print(f"-> Failed as expected. PyYAML cannot serialize {type(e.args[1])}.")
    print(f"\n{e}")

# 3. Sanitize the config using a JSON cycle and try again. This will work.
print("\nAttempting to dump the sanitized config...")
sanitized_config = json.loads(json.dumps(problematic_config))
try:
    yaml.safe_dump(sanitized_config)
    print("-> Succeeded. The sanitized config was dumped successfully.")
except Exception as e:
    print(f"-> Failed unexpectedly: {e}")

Outputs:

Attempting to dump the raw ConfigDict...
-> Failed as expected. PyYAML cannot serialize <class 'mmengine.config.config.ConfigDict'>.

('cannot represent an object', {'hf_overrides': {'rope_scaling': {'rope_type': 'yarn', 'factor': 4.0, 'original_max_position_embeddings': 32768}}})

Attempting to dump the sanitized config...
-> Succeeded. The sanitized config was dumped successfully.

@CUHKSZzxy CUHKSZzxy marked this pull request as ready for review July 11, 2025 03:23
tokenizer = Tokenizer(model_path).model.model
model_config = ModelConfig.from_pretrained(model_path, dtype=dtype, dist_config=dist_config)

if misc_config.hf_overrides is not None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add hf_overrides as an argument of ModelConfig.from_pretrained and do the process in it.

Comment thread lmdeploy/turbomind/deploy/config.py Outdated
Comment thread lmdeploy/turbomind/deploy/config.py Outdated
Copy link
Copy Markdown
Collaborator

@grimoire grimoire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lvhan028 lvhan028 merged commit cde5a5e into InternLM:main Jul 15, 2025
5 checks passed
@CUHKSZzxy CUHKSZzxy deleted the hf-config-overrides branch July 16, 2025 02:30
@CUHKSZzxy CUHKSZzxy mentioned this pull request Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants