Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llava: fix clip-model-is-vision flag in README.md #5509

Merged
merged 2 commits into from
Feb 16, 2024

Conversation

danbev
Copy link
Collaborator

@danbev danbev commented Feb 15, 2024

This commit fixes the flag --clip_model_is_vision in README.md which is does not match the actual flag:

$ python convert-image-encoder-to-gguf.py --help
...
  --clip-model-is-vision
                        The clip model is a pure vision model (ShareGPT4V vision extract for example)

This commit fixes the flag `--clip_model_is_vision` in README.md which
is does not match the actual flag:
```console
$ python convert-image-encoder-to-gguf.py --help
...
  --clip-model-is-vision
                        The clip model is a pure vision model
                        (ShareGPT4V vision extract for example)
```

Signed-off-by: Daniel Bevenius <[email protected]>
@cmp-nct
Copy link
Contributor

cmp-nct commented Feb 15, 2024

The arg is checked like this in the python script:
if args.clip_model_is_vision
But maybe both ways work in python

That said: the argument as I wrote it in the readme works. Maybe both variants work ?
If only the original variant works we should update the python script instead (the help message shown)

@danbev
Copy link
Collaborator Author

danbev commented Feb 15, 2024

That said: the argument as I wrote it in the readme works. Maybe both variants work ?

I ran into an issue copying the command from README and got the following error:

(llava-venv) $ python ./examples/llava/convert-image-encoder-to-gguf.py -m vit --llava-projector vit/llava.projector --output-dir vit --clip_model_is_vision
usage: convert-image-encoder-to-gguf.py [-h] -m MODEL_DIR [--use-f32] [--text-only] [--vision-only]
                                        [--clip-model-is-vision] [--clip-model-is-openclip]
                                        [--llava-projector LLAVA_PROJECTOR] [--projector-type {mlp,ldp}]
                                        [-o OUTPUT_DIR] [--image-mean IMAGE_MEAN [IMAGE_MEAN ...]]
                                        [--image-std IMAGE_STD [IMAGE_STD ...]]
convert-image-encoder-to-gguf.py: error: unrecognized arguments: --clip_model_is_vision

I might be missing something as it works for you. I'd be happy to update if needed, just wondering about consistency with the other options which seem to use - and not _.

@cmp-nct
Copy link
Contributor

cmp-nct commented Feb 15, 2024

It looks like the python file was indeed changed, on my side it was with underscores still!

It looks like python adapts arguments with a "-" internally to underscores as the minus is illegal in variable names.
Feels a bit wrong to me to use a different argument as the variable turns up but your change indeed is fixing it.

@danbev
Copy link
Collaborator Author

danbev commented Feb 15, 2024

@cmp-nct May I bother you with another issue I ran into (after the issue in this PR) which is:

Traceback (most recent call last):
  File "/home/danielbevenius/work/ai/llama.cpp/./examples/llava/convert-image-encoder-to-gguf.py", line 199, in <module>
    fout.add_uint32("clip.vision.image_size", v_hparams["image_size"])
                                              ~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'image_size'

I was able to work around this by changing:

(llava-venv) $ git diff examples
diff --git a/examples/llava/convert-image-encoder-to-gguf.py b/examples/llava/convert-image-encoder-to-gguf.py
index c69f89ac..94754a47 100644
--- a/examples/llava/convert-image-encoder-to-gguf.py
+++ b/examples/llava/convert-image-encoder-to-gguf.py
@@ -117,7 +117,7 @@ else:
 with open(dir_model + "/config.json", "r", encoding="utf-8") as f:
     config = json.load(f)
     if args.clip_model_is_vision:
-        v_hparams = config
+        v_hparams = config["vision_config"]
         t_hparams = None
     else:
         v_hparams = config["vision_config"]

I'm using the config.json suggested in the README but not sure if this is a correct "fix". With this change I'm able to run the llava-cli with llava-v1.6-vicuna-7b/ggml-model-f16.gguf 🎉

@cmp-nct
Copy link
Contributor

cmp-nct commented Feb 15, 2024

@cmp-nct May I bother you with another issue I ran into (after the issue in this PR) which is:

Traceback (most recent call last):
  File "/home/danielbevenius/work/ai/llama.cpp/./examples/llava/convert-image-encoder-to-gguf.py", line 199, in <module>
    fout.add_uint32("clip.vision.image_size", v_hparams["image_size"])
                                              ~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'image_size'

I was able to work around this by changing:

(llava-venv) $ git diff examples
diff --git a/examples/llava/convert-image-encoder-to-gguf.py b/examples/llava/convert-image-encoder-to-gguf.py
index c69f89ac..94754a47 100644
--- a/examples/llava/convert-image-encoder-to-gguf.py
+++ b/examples/llava/convert-image-encoder-to-gguf.py
@@ -117,7 +117,7 @@ else:
 with open(dir_model + "/config.json", "r", encoding="utf-8") as f:
     config = json.load(f)
     if args.clip_model_is_vision:
-        v_hparams = config
+        v_hparams = config["vision_config"]
         t_hparams = None
     else:
         v_hparams = config["vision_config"]

I'm using the config.json suggested in the README but not sure if this is a correct "fix". With this change I'm able to run the llava-cli with llava-v1.6-vicuna-7b/ggml-model-f16.gguf 🎉

The original was correct, what the script could need is automatic detection. Though there is a lot that is needed to be done with the current llava and clip implementation ;)

When you use --model-is-vision the conversion script expects a Vision tower from CLIP and a configuration for that vision tower.
When you omit this argument the conversion script expects a full CLIP model and its configuration file.

So config["vision_config"] is the configuration of the vision tower inside a full CLIP model config.

Now the problem is that my config.json on the huggingface is not the one I used for the vit, I just noticed that.
I uploaded a new config_vit.json (https://huggingface.co/cmp-nct/llava-1.6-gguf/blob/main/config_vit.json) which contains the config you should use for the pure vision tower (needs to be renamed).
Can you add that change into your PR ?

examples/llava/README.md Outdated Show resolved Hide resolved
@ggerganov ggerganov merged commit 60ed04c into ggerganov:master Feb 16, 2024
21 checks passed
@danbev danbev deleted the llava-clip-model-is-vision branch February 16, 2024 11:11
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* llava: fix clip-model-is-vision flag in README.md

This commit fixes the flag `--clip_model_is_vision` in README.md which
is does not match the actual flag:
```console
$ python convert-image-encoder-to-gguf.py --help
...
  --clip-model-is-vision
                        The clip model is a pure vision model
                        (ShareGPT4V vision extract for example)
```

Signed-off-by: Daniel Bevenius <[email protected]>

* llava: update link to vit config in README.md

Signed-off-by: Daniel Bevenius <[email protected]>

---------

Signed-off-by: Daniel Bevenius <[email protected]>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* llava: fix clip-model-is-vision flag in README.md

This commit fixes the flag `--clip_model_is_vision` in README.md which
is does not match the actual flag:
```console
$ python convert-image-encoder-to-gguf.py --help
...
  --clip-model-is-vision
                        The clip model is a pure vision model
                        (ShareGPT4V vision extract for example)
```

Signed-off-by: Daniel Bevenius <[email protected]>

* llava: update link to vit config in README.md

Signed-off-by: Daniel Bevenius <[email protected]>

---------

Signed-off-by: Daniel Bevenius <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants