Fix issues with multiple image handling (Pixtral) by hiima234 · Pull Request #78 · Blaizzy/mlx-vlm

hiima234 · 2024-10-09T23:57:04Z

[IMG_BREAK] and [IMG_END] are lost after embedding
image position encode should be done per image base https://github.com/mistralai/mistral-inference/blob/main/src/mistral_inference/vision_encoder.py#L85 https://github.com/huggingface/transformers/blob/main/src/transformers/models/pixtral/modeling_pixtral.py#L492

1. [IMG_BREAK] and [IMG_END] are lost after embedding 2. image position encode should be done per image base https://github.com/mistralai/mistral-inference/blob/main/src/mistral_inference/vision_encoder.py#L85 https://github.com/huggingface/transformers/blob/main/src/transformers/models/pixtral/modeling_pixtral.py#L492

Blaizzy · 2024-10-11T15:53:54Z

@hiima234 thank you very much!

Blaizzy · 2024-10-11T16:13:42Z

Really nice catch! 🔥

Blaizzy · 2024-10-11T16:16:09Z

It more consistent now!

Especially with quantized model!

Blaizzy · 2024-10-11T16:16:42Z

What is your X account handle?

* remove torch and mlx-lm * remove torch and mlx-lm * add peft model creation * use tree flatten * add dataset loader * fix dataset * fix masks and rename dataset * support batch processing and train on completions * fix trainer * formatting * add support for none splits and fix assistant id * Add lora script and docs * remove torch and mlx-lm * add peft model creation * use tree flatten * add dataset loader * fix dataset * fix masks and rename dataset * support batch processing and train on completions * fix trainer * formatting * add support for none splits and fix assistant id * Add lora script and docs * remove duplicates * fix batch load * load trained adapters and add super to all models * fix pixtral quant * speed up qwen batch processing * fix qlora training * fix dataloader * formatting * fix pixtral pixel loading * fix lora and dataset * add batch processing suppor for qwen2_vl * update lora docs * add unit tests * set stage for phi3_v support * update logs and readme * add utils tests and remove unused collate fn * refactor prompt utils and add multi-image support for pixtral * add llava interleave support * multi image support * add image resizing * refactor data loading * update data procesing and tqdm * add llava interleave * formatting * add list of models with multi-image support * remove trimmed labels * remove warning * pin reqs * add config dict condition * fix pixtral FT prompt * formatting images * remove unused * update trainer init * update lora * update md and formatting * bump version * add tests for pixtral and qwen2_vl * add tests for pixtral * Merge branch 'pc/tuner' of https://github.com/Blaizzy/mlx-vlm into pc/tuner * fix test * remove rope scaling * remove test args and update MD * format dataset defaults * add dataset formatting info * Fix issues with multiple image handling (#78) 1. [IMG_BREAK] and [IMG_END] are lost after embedding 2. image position encode should be done per image base https://github.com/mistralai/mistral-inference/blob/main/src/mistral_inference/vision_encoder.py#L85 https://github.com/huggingface/transformers/blob/main/src/transformers/models/pixtral/modeling_pixtral.py#L492 Co-authored-by: Roger Xu <rogerxu@gmail.com> * fix styling * update model * update default model * rewrite comments --------- Co-authored-by: hiima234 <98786318+hiima234@users.noreply.github.com> Co-authored-by: Roger Xu <rogerxu@gmail.com>

* remove torch and mlx-lm * remove torch and mlx-lm * add peft model creation * use tree flatten * add dataset loader * fix dataset * fix masks and rename dataset * support batch processing and train on completions * fix trainer * formatting * add support for none splits and fix assistant id * Add lora script and docs * remove torch and mlx-lm * add peft model creation * use tree flatten * add dataset loader * fix dataset * fix masks and rename dataset * support batch processing and train on completions * fix trainer * formatting * add support for none splits and fix assistant id * Add lora script and docs * remove duplicates * fix batch load * load trained adapters and add super to all models * fix pixtral quant * speed up qwen batch processing * fix qlora training * fix dataloader * formatting * fix pixtral pixel loading * fix lora and dataset * add batch processing suppor for qwen2_vl * update lora docs * add unit tests * set stage for phi3_v support * update logs and readme * add utils tests and remove unused collate fn * refactor prompt utils and add multi-image support for pixtral * add llava interleave support * multi image support * add image resizing * refactor data loading * update data procesing and tqdm * add llava interleave * formatting * add list of models with multi-image support * remove trimmed labels * remove warning * pin reqs * add config dict condition * fix pixtral FT prompt * formatting images * remove unused * update trainer init * update lora * update md and formatting * bump version * add tests for pixtral and qwen2_vl * add tests for pixtral * Merge branch 'pc/tuner' of https://github.com/Blaizzy/mlx-vlm into pc/tuner * fix test * remove rope scaling * remove test args and update MD * format dataset defaults * add dataset formatting info * Fix issues with multiple image handling (Blaizzy#78) 1. [IMG_BREAK] and [IMG_END] are lost after embedding 2. image position encode should be done per image base https://github.com/mistralai/mistral-inference/blob/main/src/mistral_inference/vision_encoder.py#L85 https://github.com/huggingface/transformers/blob/main/src/transformers/models/pixtral/modeling_pixtral.py#L492 Co-authored-by: Roger Xu <rogerxu@gmail.com> * fix styling * update model * update default model * rewrite comments --------- Co-authored-by: hiima234 <98786318+hiima234@users.noreply.github.com> Co-authored-by: Roger Xu <rogerxu@gmail.com>

Blaizzy merged commit 1f3eabd into Blaizzy:pc/tuner Oct 11, 2024

Blaizzy changed the title ~~Fix issues with multiple image handling~~ Fix issues with multiple image handling (Pixtral) Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix issues with multiple image handling (Pixtral)#78

Fix issues with multiple image handling (Pixtral)#78
Blaizzy merged 1 commit intoBlaizzy:pc/tunerfrom
hiima234:rx/pixtral_fix

hiima234 commented Oct 9, 2024

Uh oh!

Blaizzy commented Oct 11, 2024

Uh oh!

Blaizzy commented Oct 11, 2024

Uh oh!

Blaizzy commented Oct 11, 2024

Uh oh!

Blaizzy commented Oct 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

hiima234 commented Oct 9, 2024

Uh oh!

Blaizzy commented Oct 11, 2024

Uh oh!

Blaizzy commented Oct 11, 2024

Uh oh!

Blaizzy commented Oct 11, 2024

Uh oh!

Blaizzy commented Oct 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants