-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llava: fix clip-model-is-vision flag in README.md #5509
llava: fix clip-model-is-vision flag in README.md #5509
Conversation
This commit fixes the flag `--clip_model_is_vision` in README.md which is does not match the actual flag: ```console $ python convert-image-encoder-to-gguf.py --help ... --clip-model-is-vision The clip model is a pure vision model (ShareGPT4V vision extract for example) ``` Signed-off-by: Daniel Bevenius <[email protected]>
The arg is checked like this in the python script: That said: the argument as I wrote it in the readme works. Maybe both variants work ? |
I ran into an issue copying the command from README and got the following error: (llava-venv) $ python ./examples/llava/convert-image-encoder-to-gguf.py -m vit --llava-projector vit/llava.projector --output-dir vit --clip_model_is_vision
usage: convert-image-encoder-to-gguf.py [-h] -m MODEL_DIR [--use-f32] [--text-only] [--vision-only]
[--clip-model-is-vision] [--clip-model-is-openclip]
[--llava-projector LLAVA_PROJECTOR] [--projector-type {mlp,ldp}]
[-o OUTPUT_DIR] [--image-mean IMAGE_MEAN [IMAGE_MEAN ...]]
[--image-std IMAGE_STD [IMAGE_STD ...]]
convert-image-encoder-to-gguf.py: error: unrecognized arguments: --clip_model_is_vision I might be missing something as it works for you. I'd be happy to update if needed, just wondering about consistency with the other options which seem to use |
It looks like the python file was indeed changed, on my side it was with underscores still! It looks like python adapts arguments with a "-" internally to underscores as the minus is illegal in variable names. |
@cmp-nct May I bother you with another issue I ran into (after the issue in this PR) which is: Traceback (most recent call last):
File "/home/danielbevenius/work/ai/llama.cpp/./examples/llava/convert-image-encoder-to-gguf.py", line 199, in <module>
fout.add_uint32("clip.vision.image_size", v_hparams["image_size"])
~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'image_size' I was able to work around this by changing: (llava-venv) $ git diff examples
diff --git a/examples/llava/convert-image-encoder-to-gguf.py b/examples/llava/convert-image-encoder-to-gguf.py
index c69f89ac..94754a47 100644
--- a/examples/llava/convert-image-encoder-to-gguf.py
+++ b/examples/llava/convert-image-encoder-to-gguf.py
@@ -117,7 +117,7 @@ else:
with open(dir_model + "/config.json", "r", encoding="utf-8") as f:
config = json.load(f)
if args.clip_model_is_vision:
- v_hparams = config
+ v_hparams = config["vision_config"]
t_hparams = None
else:
v_hparams = config["vision_config"] I'm using the config.json suggested in the README but not sure if this is a correct "fix". With this change I'm able to run the |
The original was correct, what the script could need is automatic detection. Though there is a lot that is needed to be done with the current llava and clip implementation ;) When you use So config["vision_config"] is the configuration of the vision tower inside a full CLIP model config. Now the problem is that my config.json on the huggingface is not the one I used for the vit, I just noticed that. |
Signed-off-by: Daniel Bevenius <[email protected]>
* llava: fix clip-model-is-vision flag in README.md This commit fixes the flag `--clip_model_is_vision` in README.md which is does not match the actual flag: ```console $ python convert-image-encoder-to-gguf.py --help ... --clip-model-is-vision The clip model is a pure vision model (ShareGPT4V vision extract for example) ``` Signed-off-by: Daniel Bevenius <[email protected]> * llava: update link to vit config in README.md Signed-off-by: Daniel Bevenius <[email protected]> --------- Signed-off-by: Daniel Bevenius <[email protected]>
* llava: fix clip-model-is-vision flag in README.md This commit fixes the flag `--clip_model_is_vision` in README.md which is does not match the actual flag: ```console $ python convert-image-encoder-to-gguf.py --help ... --clip-model-is-vision The clip model is a pure vision model (ShareGPT4V vision extract for example) ``` Signed-off-by: Daniel Bevenius <[email protected]> * llava: update link to vit config in README.md Signed-off-by: Daniel Bevenius <[email protected]> --------- Signed-off-by: Daniel Bevenius <[email protected]>
This commit fixes the flag
--clip_model_is_vision
in README.md which is does not match the actual flag: