Skip to content

Fix issues with multiple image handling (Pixtral)#78

Merged
Blaizzy merged 1 commit intoBlaizzy:pc/tunerfrom
hiima234:rx/pixtral_fix
Oct 11, 2024
Merged

Fix issues with multiple image handling (Pixtral)#78
Blaizzy merged 1 commit intoBlaizzy:pc/tunerfrom
hiima234:rx/pixtral_fix

Conversation

@hiima234
Copy link
Contributor

@hiima234 hiima234 commented Oct 9, 2024

@Blaizzy
Copy link
Owner

Blaizzy commented Oct 11, 2024

@hiima234 thank you very much!

@Blaizzy Blaizzy merged commit 1f3eabd into Blaizzy:pc/tuner Oct 11, 2024
@Blaizzy
Copy link
Owner

Blaizzy commented Oct 11, 2024

Really nice catch! 🔥

@Blaizzy
Copy link
Owner

Blaizzy commented Oct 11, 2024

It more consistent now!

Especially with quantized model!

Screenshot 2024-10-11 at 6 13 56 PM

@Blaizzy
Copy link
Owner

Blaizzy commented Oct 11, 2024

What is your X account handle?

@Blaizzy Blaizzy changed the title Fix issues with multiple image handling Fix issues with multiple image handling (Pixtral) Oct 11, 2024
Blaizzy added a commit that referenced this pull request Oct 11, 2024
* remove torch and mlx-lm

* remove torch and mlx-lm

* add peft model creation

* use tree flatten

* add dataset loader

* fix dataset

* fix masks and rename dataset

* support batch processing and train on completions

* fix trainer

* formatting

* add support for none splits and fix assistant id

* Add lora script and docs

* remove torch and mlx-lm

* add peft model creation

* use tree flatten

* add dataset loader

* fix dataset

* fix masks and rename dataset

* support batch processing and train on completions

* fix trainer

* formatting

* add support for none splits and fix assistant id

* Add lora script and docs

* remove duplicates

* fix batch load

* load trained adapters and add super to all models

* fix pixtral quant

* speed up qwen batch processing

* fix qlora training

* fix dataloader

* formatting

* fix pixtral pixel loading

* fix lora and dataset

* add batch processing suppor for qwen2_vl

* update lora docs

* add unit tests

* set stage for phi3_v support

* update logs and readme

* add utils tests and remove unused collate fn

* refactor prompt utils and add multi-image support for pixtral

* add llava interleave support

* multi image support

* add image resizing

* refactor data loading

* update data procesing and tqdm

* add llava interleave

* formatting

* add list of models with multi-image support

* remove trimmed labels

* remove warning

* pin reqs

* add config dict condition

* fix pixtral FT prompt

* formatting images

* remove unused

* update trainer init

* update lora

* update md and formatting

* bump version

* add tests for pixtral and qwen2_vl

* add tests for pixtral

* Merge branch 'pc/tuner' of https://github.com/Blaizzy/mlx-vlm into pc/tuner

* fix test

* remove rope scaling

* remove test args and update MD

* format dataset defaults

* add dataset formatting info

* Fix issues with multiple image handling (#78)

1. [IMG_BREAK] and [IMG_END] are lost after embedding
 2. image position encode should be done per image base
    https://github.com/mistralai/mistral-inference/blob/main/src/mistral_inference/vision_encoder.py#L85
    https://github.com/huggingface/transformers/blob/main/src/transformers/models/pixtral/modeling_pixtral.py#L492

Co-authored-by: Roger Xu <rogerxu@gmail.com>

* fix styling

* update model

* update default model

* rewrite comments

---------

Co-authored-by: hiima234 <98786318+hiima234@users.noreply.github.com>
Co-authored-by: Roger Xu <rogerxu@gmail.com>
Garry-TI pushed a commit to Garry-TI/mlx-vlm that referenced this pull request Sep 23, 2025
* remove torch and mlx-lm

* remove torch and mlx-lm

* add peft model creation

* use tree flatten

* add dataset loader

* fix dataset

* fix masks and rename dataset

* support batch processing and train on completions

* fix trainer

* formatting

* add support for none splits and fix assistant id

* Add lora script and docs

* remove torch and mlx-lm

* add peft model creation

* use tree flatten

* add dataset loader

* fix dataset

* fix masks and rename dataset

* support batch processing and train on completions

* fix trainer

* formatting

* add support for none splits and fix assistant id

* Add lora script and docs

* remove duplicates

* fix batch load

* load trained adapters and add super to all models

* fix pixtral quant

* speed up qwen batch processing

* fix qlora training

* fix dataloader

* formatting

* fix pixtral pixel loading

* fix lora and dataset

* add batch processing suppor for qwen2_vl

* update lora docs

* add unit tests

* set stage for phi3_v support

* update logs and readme

* add utils tests and remove unused collate fn

* refactor prompt utils and add multi-image support for pixtral

* add llava interleave support

* multi image support

* add image resizing

* refactor data loading

* update data procesing and tqdm

* add llava interleave

* formatting

* add list of models with multi-image support

* remove trimmed labels

* remove warning

* pin reqs

* add config dict condition

* fix pixtral FT prompt

* formatting images

* remove unused

* update trainer init

* update lora

* update md and formatting

* bump version

* add tests for pixtral and qwen2_vl

* add tests for pixtral

* Merge branch 'pc/tuner' of https://github.com/Blaizzy/mlx-vlm into pc/tuner

* fix test

* remove rope scaling

* remove test args and update MD

* format dataset defaults

* add dataset formatting info

* Fix issues with multiple image handling (Blaizzy#78)

1. [IMG_BREAK] and [IMG_END] are lost after embedding
 2. image position encode should be done per image base
    https://github.com/mistralai/mistral-inference/blob/main/src/mistral_inference/vision_encoder.py#L85
    https://github.com/huggingface/transformers/blob/main/src/transformers/models/pixtral/modeling_pixtral.py#L492

Co-authored-by: Roger Xu <rogerxu@gmail.com>

* fix styling

* update model

* update default model

* rewrite comments

---------

Co-authored-by: hiima234 <98786318+hiima234@users.noreply.github.com>
Co-authored-by: Roger Xu <rogerxu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants