Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEmma 3 - pass-merge errors #537

Open
David-AU-github opened this issue Mar 14, 2025 · 5 comments
Open

GEmma 3 - pass-merge errors #537

David-AU-github opened this issue Mar 14, 2025 · 5 comments

Comments

@David-AU-github
Copy link

Updated to latest Mergekit and correct transformers with GEmma 3 ; getting following errors
( simple pass-through merge, same model - no other models )

Model: 12b GEmma 3 it (reg, not "pt" / multimodal version)

NOTE: "mergekit7" is newest install , I have several.

mergekit-yaml --copy-tokenizer --allow-crimes --cuda --out-shard-size 5B --lazy-unpickle --clone-tensors f:/mergefiles/Gemma-3-12B-exp40-3.txt E:/Gemma-3-12B-exp40-3

WARNING:mergekit.merge:Unable to set number of layers for module multi_modal_projector in output config - you may need to manually correct it.
Traceback (most recent call last):
File "F:\mergekit7\mergekit\mergekit\merge.py", line 300, in _model_out_config
set_config_value(res, cfg_key, module_layers[module_name])
File "F:\mergekit7\mergekit\mergekit\common.py", line 37, in set_config_value
parts = key.split(".")
^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'
Warmup loader cache: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Program Files\Python312\Scripts\mergekit-yaml.exe_main
.py", line 7, in
File "C:\Users\david\AppData\Roaming\Python\Python312\site-packages\click\core.py", line 1161, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\david\AppData\Roaming\Python\Python312\site-packages\click\core.py", line 1082, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "C:\Users\david\AppData\Roaming\Python\Python312\site-packages\click\core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\david\AppData\Roaming\Python\Python312\site-packages\click\core.py", line 788, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\mergekit7\mergekit\mergekit\options.py", line 123, in wrapper
f(*args, **kwargs)
File "F:\mergekit7\mergekit\mergekit\scripts\run_yaml.py", line 30, in main
run_merge(
File "F:\mergekit7\mergekit\mergekit\merge.py", line 70, in run_merge
).plan_to_disk(out_path=out_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\mergekit7\mergekit\mergekit\plan.py", line 335, in plan_to_disk
self._plan()
File "F:\mergekit7\mergekit\mergekit\plan.py", line 376, in _plan
self.normalize_config()
File "F:\mergekit7\mergekit\mergekit\plan.py", line 105, in normalize_config
raise RuntimeError(
RuntimeError: Model has multiple modules, must use modules: config syntax to work with slices

@cg123
Copy link
Collaborator

cg123 commented Mar 16, 2025

With models like this slicing gets a little bit trickier - since the vision tower and the language model have a different number of layers it doesn't make sense to specify slices in such a way. You can use this new syntax:

merge_method: passthrough
modules:
  text_decoder:
    # frankenmerge the text model
    slices:
      - sources:
          - model: google/gemma-3-12b-it
            layer_range: [0, 24]
      - sources:
          - model: google/gemma-3-12b-it
            layer_range: [8, 48]
  vision_tower:
    # keep the vision tower as is
    models:
      - model: google/gemma-3-12b-it
    # or also frankenmerge it?
    # slices:
    #   - sources:
    #       - model: google/gemma-3-12b-it
    #         layer_range: [0, 16]
    #   - sources:
    #       - model: google/gemma-3-12b-it
    #         layer_range: [8, 27]
  multi_modal_projector:
    # no layers in this module just a single set of weights
    models:
      - model: google/gemma-3-12b-it

@SicariusSicariiStuff
Copy link

It gives the following error:

e-tensors --cuda --trust-remote-code
Traceback (most recent call last):
File "/home/sicarius/mergekit/env/bin/mergekit-yaml", line 8, in
sys.exit(main())
^^^^^^
File "/home/sicarius/mergekit/env/lib/python3.11/site-packages/click/core.py", line 1161, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sicarius/mergekit/env/lib/python3.11/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/sicarius/mergekit/env/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sicarius/mergekit/env/lib/python3.11/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sicarius/mergekit/mergekit/options.py", line 123, in wrapper
f(*args, **kwargs)
File "/home/sicarius/mergekit/mergekit/scripts/run_yaml.py", line 30, in main
run_merge(
File "/home/sicarius/mergekit/mergekit/merge.py", line 40, in run_merge
raise RuntimeError("No output requested")
RuntimeError: No output requested

@cg123
Copy link
Collaborator

cg123 commented Mar 16, 2025

Sorry about that! Turns out I didn't fully update the config sanity checks and it was not allowing some valid configs. This config works on main now.

@David-AU-github
Copy link
Author

thank you !!!

@David-AU-github
Copy link
Author

Quick question:
Same format with DARE TIES? other merge types?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants