feat: multimodal endpoint for image to text #144

blefo · 2025-08-25T17:45:10Z

This PR adds multimodal support to the chat completion endpoint

The chat completion endpoint now accepts images in base64 format, enabling text + image model inputs.
The request body format aligns with the OpenAI client specification.
Web search is updated to support both web search and multimodal inputs.

… + image format validator

…o google/gemma-3-4b-it

…r tests + added multimodal e2e tests

packages/nilai-common/src/nilai_common/api_model.py

nilai-api/src/nilai_api/handlers/web_search.py

tests/e2e/config.py

jcabrero

You need to check the API Model changes as they break tool calling and make them more similar to what OpenAI support for image_url is.

The PR is overall good 👍 Good job.

…l and ignore it

… content parts

…tent from chat completions

…h user queries

…nhance system message updates

… user query from messages

…dal completion tests

… modules

…ndling for multimodal content

…d improved error handling for multimodal content

jcabrero · 2025-09-01T14:15:58Z

packages/nilai-common/src/nilai_common/api_model.py

-class Message(ChatCompletionMessage):
-    role: Literal["system", "user", "assistant", "tool"]  # type: ignore


class Message (ChatCompletionMessageParam): pass

blefo added 9 commits August 25, 2025 10:07

feat: added gema 27b

f4c8197

feat: implement multimodal content support with image URL validation

617a590

refactor: added multimodal parameter + web search with image in query…

941f784

… + image format validator

refactor: update chat completion message structure and change model t…

a74acbc

…o google/gemma-3-4b-it

feat: add Docker Compose configuration for gemma-4b in ci pipeline fo…

7fe8325

…r tests + added multimodal e2e tests

fix: ruff format

ae4422a

refactor: remove unused import in e2e and unit tests

7c8b61e

test: add rate limit checks to multimodal chat completion tests

92e8ece

fix: ruff format

27a91b9

jcabrero reviewed Aug 26, 2025

View reviewed changes

packages/nilai-common/src/nilai_common/api_model.py Outdated Show resolved Hide resolved

jcabrero reviewed Aug 26, 2025

View reviewed changes

nilai-api/src/nilai_api/handlers/web_search.py Show resolved Hide resolved

jcabrero reviewed Aug 26, 2025

View reviewed changes

nilai-api/src/nilai_api/handlers/web_search.py Outdated Show resolved Hide resolved

jcabrero reviewed Aug 26, 2025

View reviewed changes

tests/e2e/config.py Outdated Show resolved Hide resolved

jcabrero requested changes Aug 26, 2025

View reviewed changes

blefo added 16 commits August 26, 2025 11:50

refactor: update message model structure

2fcce35

fix: ruff format

3e3cc56

test: enhance chat completion tests with multimodal model integration

2a9669c

fix: web search + multimodal with 3 sources

3c62952

chore: stop tracking docker/compose/docker-compose.gemma-4b-gpu.ci.ym…

a52ec27

…l and ignore it

refactor: clean up imports in tests

ad4fa11

fix: add type ignore for role in Message class

f0e5848

refactor: update Message class content type to use new ChatCompletion…

338a5ec

… content parts

feat: add content extractor utility for processing text and image con…

e85a711

…tent from chat completions

refactor: improve web search handling and enhance message context wit…

758d809

…h user queries

refactor: integrate content extraction into user query handling and e…

fc9ea5d

…nhance system message updates

feat: add functions to handle multimodal content and extract the last…

758dc0d

… user query from messages

refactor: remove deprecated image support handler and enhance multimo…

eaf2e96

…dal completion tests

refactor: clean up unused imports in web search and content extractor…

7973642

… modules

feat: implement chat completion tests with image support and error ha…

f9b71cd

…ndling for multimodal content

feat: enhance chat completion tests with rate limit configurations an…

746edbf

…d improved error handling for multimodal content

blefo added 27 commits August 29, 2025 16:19

test#2: remove llama-1b

da0602f

fix: update the script for gemma

fccbd1d

fix#2

03ca9eb

fix: add service startup logs

359f518

fix: update gemma ci config

967eace

fix: added logs for services

8b5a073

fix: gemma config

e9cc0da

fix: gemma config

3dcd4e5

fix: gemma config

86c8e0a

fix: gemma config

cfc2e07

fix: gemma config

96f78af

fix: gemma config

f1c7b4d

fix: gemma config

f4451ca

fix: update gemma config

25bea10

fix: gemma config

bd8ba99

fix: gemma config

eb4dbe0

fix: use qwen-2b instead of gemma-4b for ci pipeline

eb3f3de

fix: update qwen config

b0f36c6

fix: qwen config

7c2b140

fix: update qwen config

4375c03

fix: config as list

471c7cb

fix: qwen config

a842da8

fix: avoid parsing error

9115580

fix: qwen config format

30fb2c9

fix: update config

aa43f24

fix: update config

8f6b06e

fix: enfore eager

244d14d

jcabrero reviewed Sep 1, 2025

View reviewed changes

fix: api model fixes

fd9ab43

jcabrero linked an issue Oct 8, 2025 that may be closed by this pull request

Add new models to the catalogue #128

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: multimodal endpoint for image to text #144

feat: multimodal endpoint for image to text #144

Uh oh!

blefo commented Aug 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jcabrero left a comment

Uh oh!

jcabrero Sep 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		class Message(ChatCompletionMessage):
		role: Literal["system", "user", "assistant", "tool"] # type: ignore

feat: multimodal endpoint for image to text #144

Are you sure you want to change the base?

feat: multimodal endpoint for image to text #144

Uh oh!

Conversation

blefo commented Aug 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jcabrero left a comment

Choose a reason for hiding this comment

Uh oh!

jcabrero Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants