Skip to content

Conversation

@skhorasganiTT
Copy link
Collaborator

@skhorasganiTT skhorasganiTT commented Sep 12, 2025

vLLM nightly tests - https://github.com/tenstorrent/tt-metal/actions/runs/17680447236

FYI @ppetrovicTT, @rdraskicTT, added as optional reviewers (I realize this is hard to review, the main changes to the TT backend were those mentioned above)

ilmarkov and others added 30 commits July 11, 2025 18:58
Signed-off-by: Trevor Morris <[email protected]>
Signed-off-by: mgoin <[email protected]>
Co-authored-by: mgoin <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>
Copy link

@ppetrovicTT ppetrovicTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's hard to review. I trust you :)

github-merge-queue bot pushed a commit to tenstorrent/tt-metal that referenced this pull request Sep 15, 2025
…uly22 upstream changes - removed legacy input processors and refactored for multi-modal models (#28406)

### Ticket
[N/A](#27285)

### Problem description
- Legacy input mappers/processors were removed from vLLM V0
(vllm-project/vllm#15686,
vllm-project/vllm#10114). These changes are
required to maintain compatibility of existing integrated models after
pulling upstream changes in
tenstorrent/vllm#172.

### What's changed
- Removed legacy vLLM input processors from Llama3, Gemma3, Qwen2.5-VL
- Defined new multi-modal input processor classes for
Llama3.2-11B-Vision (`MllamaMultiModalProcessor`), Gemma3 / Qwen2.5-VL
(`MultiModalProcessor`) and added support multi-modal limits for each
- Moved max seq len assertion for Llama8B to model initialization,
`--max_model_len` must be set on vLLM side for any models which support
less than default max context length
- Fixed bug where `create_multimodal_model` import was removed for
Llama3.2-11B-Vision and broke the model (from
87b758d)

### Checklist
- [x] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml)
CI passes
- [x] [Blackhole Post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml)
CI with demo tests passes (if applicable)
- [x] [Model
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
CI passes (if applicable)
- [x] [Device performance
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
CI passes (if applicable)
- [x] (For models and ops writers) [Single-card demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml)
CI passes (if applicable) See [recommended dev
flow](https://github.com/tenstorrent/tt-metal/blob/main/models/docs/MODEL_ADD.md#a-recommended-dev-flow-on-github-for-adding-new-models).
- [x] [Galaxy
quick](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml)
CI passes (if applicable)
- [x] [Galaxy demo tests, for
Llama](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml)
CI passes, if applicable, because of current Llama work
- [x] (For runtime and ops writers) [T3000 unit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml)
CI passes (if applicable, since this is run on push to main)
- [x] (For models and ops writers) [T3000 demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml)
CI passes (if applicable, since this is required for release)
- [x] New/Existing tests provide coverage for changes

vLLM nightly tests -
https://github.com/tenstorrent/tt-metal/actions/runs/17680447236

---------

Signed-off-by: Salar <[email protected]>
Co-authored-by: Igor Djuric <[email protected]>
@skhorasganiTT skhorasganiTT merged commit 99a3e13 into dev Sep 15, 2025
2 checks passed
@skhorasganiTT skhorasganiTT deleted the skhorasgani/pull_upstream_july22_2 branch September 15, 2025 11:17
@skhorasganiTT skhorasganiTT restored the skhorasgani/pull_upstream_july22_2 branch September 15, 2025 13:22
dimitri-tenstorrent pushed a commit to tenstorrent/tt-metal that referenced this pull request Sep 15, 2025
…uly22 upstream changes - removed legacy input processors and refactored for multi-modal models (#28406)

### Ticket
[N/A](#27285)

### Problem description
- Legacy input mappers/processors were removed from vLLM V0
(vllm-project/vllm#15686,
vllm-project/vllm#10114). These changes are
required to maintain compatibility of existing integrated models after
pulling upstream changes in
tenstorrent/vllm#172.

### What's changed
- Removed legacy vLLM input processors from Llama3, Gemma3, Qwen2.5-VL
- Defined new multi-modal input processor classes for
Llama3.2-11B-Vision (`MllamaMultiModalProcessor`), Gemma3 / Qwen2.5-VL
(`MultiModalProcessor`) and added support multi-modal limits for each
- Moved max seq len assertion for Llama8B to model initialization,
`--max_model_len` must be set on vLLM side for any models which support
less than default max context length
- Fixed bug where `create_multimodal_model` import was removed for
Llama3.2-11B-Vision and broke the model (from
87b758d)

### Checklist
- [x] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml)
CI passes
- [x] [Blackhole Post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml)
CI with demo tests passes (if applicable)
- [x] [Model
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
CI passes (if applicable)
- [x] [Device performance
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
CI passes (if applicable)
- [x] (For models and ops writers) [Single-card demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml)
CI passes (if applicable) See [recommended dev
flow](https://github.com/tenstorrent/tt-metal/blob/main/models/docs/MODEL_ADD.md#a-recommended-dev-flow-on-github-for-adding-new-models).
- [x] [Galaxy
quick](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml)
CI passes (if applicable)
- [x] [Galaxy demo tests, for
Llama](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml)
CI passes, if applicable, because of current Llama work
- [x] (For runtime and ops writers) [T3000 unit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml)
CI passes (if applicable, since this is run on push to main)
- [x] (For models and ops writers) [T3000 demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml)
CI passes (if applicable, since this is required for release)
- [x] New/Existing tests provide coverage for changes

vLLM nightly tests -
https://github.com/tenstorrent/tt-metal/actions/runs/17680447236

---------

Signed-off-by: Salar <[email protected]>
Co-authored-by: Igor Djuric <[email protected]>
DorsaRoh pushed a commit to tenstorrent/tt-metal that referenced this pull request Sep 15, 2025
…uly22 upstream changes - removed legacy input processors and refactored for multi-modal models (#28406)

### Ticket
[N/A](#27285)

### Problem description
- Legacy input mappers/processors were removed from vLLM V0
(vllm-project/vllm#15686,
vllm-project/vllm#10114). These changes are
required to maintain compatibility of existing integrated models after
pulling upstream changes in
tenstorrent/vllm#172.

### What's changed
- Removed legacy vLLM input processors from Llama3, Gemma3, Qwen2.5-VL
- Defined new multi-modal input processor classes for
Llama3.2-11B-Vision (`MllamaMultiModalProcessor`), Gemma3 / Qwen2.5-VL
(`MultiModalProcessor`) and added support multi-modal limits for each
- Moved max seq len assertion for Llama8B to model initialization,
`--max_model_len` must be set on vLLM side for any models which support
less than default max context length
- Fixed bug where `create_multimodal_model` import was removed for
Llama3.2-11B-Vision and broke the model (from
87b758d)

### Checklist
- [x] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml)
CI passes
- [x] [Blackhole Post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml)
CI with demo tests passes (if applicable)
- [x] [Model
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
CI passes (if applicable)
- [x] [Device performance
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
CI passes (if applicable)
- [x] (For models and ops writers) [Single-card demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml)
CI passes (if applicable) See [recommended dev
flow](https://github.com/tenstorrent/tt-metal/blob/main/models/docs/MODEL_ADD.md#a-recommended-dev-flow-on-github-for-adding-new-models).
- [x] [Galaxy
quick](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml)
CI passes (if applicable)
- [x] [Galaxy demo tests, for
Llama](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml)
CI passes, if applicable, because of current Llama work
- [x] (For runtime and ops writers) [T3000 unit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml)
CI passes (if applicable, since this is run on push to main)
- [x] (For models and ops writers) [T3000 demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml)
CI passes (if applicable, since this is required for release)
- [x] New/Existing tests provide coverage for changes

vLLM nightly tests -
https://github.com/tenstorrent/tt-metal/actions/runs/17680447236

---------

Signed-off-by: Salar <[email protected]>
Co-authored-by: Igor Djuric <[email protected]>
subinleeTT pushed a commit to tenstorrent/tt-metal that referenced this pull request Sep 17, 2025
…uly22 upstream changes - removed legacy input processors and refactored for multi-modal models (#28406)

### Ticket
[N/A](#27285)

### Problem description
- Legacy input mappers/processors were removed from vLLM V0
(vllm-project/vllm#15686,
vllm-project/vllm#10114). These changes are
required to maintain compatibility of existing integrated models after
pulling upstream changes in
tenstorrent/vllm#172.

### What's changed
- Removed legacy vLLM input processors from Llama3, Gemma3, Qwen2.5-VL
- Defined new multi-modal input processor classes for
Llama3.2-11B-Vision (`MllamaMultiModalProcessor`), Gemma3 / Qwen2.5-VL
(`MultiModalProcessor`) and added support multi-modal limits for each
- Moved max seq len assertion for Llama8B to model initialization,
`--max_model_len` must be set on vLLM side for any models which support
less than default max context length
- Fixed bug where `create_multimodal_model` import was removed for
Llama3.2-11B-Vision and broke the model (from
87b758d)

### Checklist
- [x] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml)
CI passes
- [x] [Blackhole Post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml)
CI with demo tests passes (if applicable)
- [x] [Model
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
CI passes (if applicable)
- [x] [Device performance
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
CI passes (if applicable)
- [x] (For models and ops writers) [Single-card demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml)
CI passes (if applicable) See [recommended dev
flow](https://github.com/tenstorrent/tt-metal/blob/main/models/docs/MODEL_ADD.md#a-recommended-dev-flow-on-github-for-adding-new-models).
- [x] [Galaxy
quick](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml)
CI passes (if applicable)
- [x] [Galaxy demo tests, for
Llama](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml)
CI passes, if applicable, because of current Llama work
- [x] (For runtime and ops writers) [T3000 unit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml)
CI passes (if applicable, since this is run on push to main)
- [x] (For models and ops writers) [T3000 demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml)
CI passes (if applicable, since this is required for release)
- [x] New/Existing tests provide coverage for changes

vLLM nightly tests -
https://github.com/tenstorrent/tt-metal/actions/runs/17680447236

---------

Signed-off-by: Salar <[email protected]>
Co-authored-by: Igor Djuric <[email protected]>
ign-febin pushed a commit to ign-saurav/tt-metal that referenced this pull request Sep 22, 2025
…uly22 upstream changes - removed legacy input processors and refactored for multi-modal models (tenstorrent#28406)

### Ticket
[N/A](tenstorrent#27285)

### Problem description
- Legacy input mappers/processors were removed from vLLM V0
(vllm-project/vllm#15686,
vllm-project/vllm#10114). These changes are
required to maintain compatibility of existing integrated models after
pulling upstream changes in
tenstorrent/vllm#172.

### What's changed
- Removed legacy vLLM input processors from Llama3, Gemma3, Qwen2.5-VL
- Defined new multi-modal input processor classes for
Llama3.2-11B-Vision (`MllamaMultiModalProcessor`), Gemma3 / Qwen2.5-VL
(`MultiModalProcessor`) and added support multi-modal limits for each
- Moved max seq len assertion for Llama8B to model initialization,
`--max_model_len` must be set on vLLM side for any models which support
less than default max context length
- Fixed bug where `create_multimodal_model` import was removed for
Llama3.2-11B-Vision and broke the model (from
tenstorrent@87b758d)

### Checklist
- [x] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml)
CI passes
- [x] [Blackhole Post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml)
CI with demo tests passes (if applicable)
- [x] [Model
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
CI passes (if applicable)
- [x] [Device performance
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
CI passes (if applicable)
- [x] (For models and ops writers) [Single-card demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml)
CI passes (if applicable) See [recommended dev
flow](https://github.com/tenstorrent/tt-metal/blob/main/models/docs/MODEL_ADD.md#a-recommended-dev-flow-on-github-for-adding-new-models).
- [x] [Galaxy
quick](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml)
CI passes (if applicable)
- [x] [Galaxy demo tests, for
Llama](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml)
CI passes, if applicable, because of current Llama work
- [x] (For runtime and ops writers) [T3000 unit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml)
CI passes (if applicable, since this is run on push to main)
- [x] (For models and ops writers) [T3000 demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml)
CI passes (if applicable, since this is required for release)
- [x] New/Existing tests provide coverage for changes

vLLM nightly tests -
https://github.com/tenstorrent/tt-metal/actions/runs/17680447236

---------

Signed-off-by: Salar <[email protected]>
Co-authored-by: Igor Djuric <[email protected]>
yugi957 pushed a commit to tenstorrent/tt-metal that referenced this pull request Sep 23, 2025
…uly22 upstream changes - removed legacy input processors and refactored for multi-modal models (#28406)

### Ticket
[N/A](#27285)

### Problem description
- Legacy input mappers/processors were removed from vLLM V0
(vllm-project/vllm#15686,
vllm-project/vllm#10114). These changes are
required to maintain compatibility of existing integrated models after
pulling upstream changes in
tenstorrent/vllm#172.

### What's changed
- Removed legacy vLLM input processors from Llama3, Gemma3, Qwen2.5-VL
- Defined new multi-modal input processor classes for
Llama3.2-11B-Vision (`MllamaMultiModalProcessor`), Gemma3 / Qwen2.5-VL
(`MultiModalProcessor`) and added support multi-modal limits for each
- Moved max seq len assertion for Llama8B to model initialization,
`--max_model_len` must be set on vLLM side for any models which support
less than default max context length
- Fixed bug where `create_multimodal_model` import was removed for
Llama3.2-11B-Vision and broke the model (from
87b758d)

### Checklist
- [x] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml)
CI passes
- [x] [Blackhole Post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml)
CI with demo tests passes (if applicable)
- [x] [Model
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
CI passes (if applicable)
- [x] [Device performance
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
CI passes (if applicable)
- [x] (For models and ops writers) [Single-card demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml)
CI passes (if applicable) See [recommended dev
flow](https://github.com/tenstorrent/tt-metal/blob/main/models/docs/MODEL_ADD.md#a-recommended-dev-flow-on-github-for-adding-new-models).
- [x] [Galaxy
quick](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml)
CI passes (if applicable)
- [x] [Galaxy demo tests, for
Llama](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml)
CI passes, if applicable, because of current Llama work
- [x] (For runtime and ops writers) [T3000 unit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml)
CI passes (if applicable, since this is run on push to main)
- [x] (For models and ops writers) [T3000 demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml)
CI passes (if applicable, since this is required for release)
- [x] New/Existing tests provide coverage for changes

vLLM nightly tests -
https://github.com/tenstorrent/tt-metal/actions/runs/17680447236

---------

Signed-off-by: Salar <[email protected]>
Co-authored-by: Igor Djuric <[email protected]>
mdjuricTT pushed a commit to tenstorrent/tt-metal that referenced this pull request Sep 26, 2025
…uly22 upstream changes - removed legacy input processors and refactored for multi-modal models (#28406)

### Ticket
[N/A](#27285)

### Problem description
- Legacy input mappers/processors were removed from vLLM V0
(vllm-project/vllm#15686,
vllm-project/vllm#10114). These changes are
required to maintain compatibility of existing integrated models after
pulling upstream changes in
tenstorrent/vllm#172.

### What's changed
- Removed legacy vLLM input processors from Llama3, Gemma3, Qwen2.5-VL
- Defined new multi-modal input processor classes for
Llama3.2-11B-Vision (`MllamaMultiModalProcessor`), Gemma3 / Qwen2.5-VL
(`MultiModalProcessor`) and added support multi-modal limits for each
- Moved max seq len assertion for Llama8B to model initialization,
`--max_model_len` must be set on vLLM side for any models which support
less than default max context length
- Fixed bug where `create_multimodal_model` import was removed for
Llama3.2-11B-Vision and broke the model (from
87b758d)

### Checklist
- [x] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml)
CI passes
- [x] [Blackhole Post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml)
CI with demo tests passes (if applicable)
- [x] [Model
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
CI passes (if applicable)
- [x] [Device performance
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
CI passes (if applicable)
- [x] (For models and ops writers) [Single-card demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml)
CI passes (if applicable) See [recommended dev
flow](https://github.com/tenstorrent/tt-metal/blob/main/models/docs/MODEL_ADD.md#a-recommended-dev-flow-on-github-for-adding-new-models).
- [x] [Galaxy
quick](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml)
CI passes (if applicable)
- [x] [Galaxy demo tests, for
Llama](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml)
CI passes, if applicable, because of current Llama work
- [x] (For runtime and ops writers) [T3000 unit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml)
CI passes (if applicable, since this is run on push to main)
- [x] (For models and ops writers) [T3000 demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml)
CI passes (if applicable, since this is required for release)
- [x] New/Existing tests provide coverage for changes

vLLM nightly tests -
https://github.com/tenstorrent/tt-metal/actions/runs/17680447236

---------

Signed-off-by: Salar <[email protected]>
Co-authored-by: Igor Djuric <[email protected]>
ricozhu-TT pushed a commit to tenstorrent/tt-metal that referenced this pull request Sep 26, 2025
…uly22 upstream changes - removed legacy input processors and refactored for multi-modal models (#28406)

### Ticket
[N/A](#27285)

### Problem description
- Legacy input mappers/processors were removed from vLLM V0
(vllm-project/vllm#15686,
vllm-project/vllm#10114). These changes are
required to maintain compatibility of existing integrated models after
pulling upstream changes in
tenstorrent/vllm#172.

### What's changed
- Removed legacy vLLM input processors from Llama3, Gemma3, Qwen2.5-VL
- Defined new multi-modal input processor classes for
Llama3.2-11B-Vision (`MllamaMultiModalProcessor`), Gemma3 / Qwen2.5-VL
(`MultiModalProcessor`) and added support multi-modal limits for each
- Moved max seq len assertion for Llama8B to model initialization,
`--max_model_len` must be set on vLLM side for any models which support
less than default max context length
- Fixed bug where `create_multimodal_model` import was removed for
Llama3.2-11B-Vision and broke the model (from
87b758d)

### Checklist
- [x] [All post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml)
CI passes
- [x] [Blackhole Post
commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml)
CI with demo tests passes (if applicable)
- [x] [Model
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml)
CI passes (if applicable)
- [x] [Device performance
regression](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml)
CI passes (if applicable)
- [x] (For models and ops writers) [Single-card demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml)
CI passes (if applicable) See [recommended dev
flow](https://github.com/tenstorrent/tt-metal/blob/main/models/docs/MODEL_ADD.md#a-recommended-dev-flow-on-github-for-adding-new-models).
- [x] [Galaxy
quick](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml)
CI passes (if applicable)
- [x] [Galaxy demo tests, for
Llama](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml)
CI passes, if applicable, because of current Llama work
- [x] (For runtime and ops writers) [T3000 unit
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml)
CI passes (if applicable, since this is run on push to main)
- [x] (For models and ops writers) [T3000 demo
tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml)
CI passes (if applicable, since this is required for release)
- [x] New/Existing tests provide coverage for changes

vLLM nightly tests -
https://github.com/tenstorrent/tt-metal/actions/runs/17680447236

---------

Signed-off-by: Salar <[email protected]>
Co-authored-by: Igor Djuric <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.