Support soft_prompt or inputs_embeds? #267

jessiewiswjc · 2023-12-28T08:25:02Z

Does triton-infernece-server support multi-modal models such as blip2 in trt-llm https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/blip2?

juney-nvidia · 2024-01-01T03:37:05Z

@jessiewiswjc

Hi, although we haven't provided such example of BLIP2 pipeline in Triton backend repo, the entire inference sequence can be organized in sequential order so there should not be any blockers preventing you from calling the run.py in Triton backend.
Did you meet any specific issue?

June

jessiewiswjc · 2024-01-24T06:58:09Z

@jessiewiswjc

Hi, although we haven't provided such example of BLIP2 pipeline in Triton backend repo, the entire inference sequence can be organized in sequential order so there should not be any blockers preventing you from calling the run.py in Triton backend. Did you meet any specific issue?

June

I am sorry to hear NVIDIA/TensorRT-LLM#695 (comment). Does the preprocessing in triton not support multi-modal models?

juney-nvidia self-assigned this Jan 1, 2024

juney-nvidia added question Further information is requested triaged Issue has been triaged by maintainers labels Jan 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support soft_prompt or inputs_embeds? #267

Support soft_prompt or inputs_embeds? #267

jessiewiswjc commented Dec 28, 2023

juney-nvidia commented Jan 1, 2024

jessiewiswjc commented Jan 24, 2024

Support soft_prompt or inputs_embeds? #267

Support soft_prompt or inputs_embeds? #267

Comments

jessiewiswjc commented Dec 28, 2023

juney-nvidia commented Jan 1, 2024

jessiewiswjc commented Jan 24, 2024