Add Llama 3.2 Vision inference service with CPU-only Q4_K_M quantization #12

Copilot · 2026-01-13T21:26:44Z

Implementation Plan: Llama 3.2 Vision Docker with Flask Webhook

Latest Changes:

Added Literal types to ErrorResponse.error_type and HealthResponse.status
Ensures Python runtime validation matches JSON schema enum constraints
Updated docstring to clarify models.py is for validation, not schema definition

Complete Implementation:
All requirements met with proper schema definition (JSON), runtime validation (Pydantic with type constraints), and comprehensive documentation.

Original prompt

Setup docker image that will run llama 3.2 vision (use env var for model) on cpu (no gpu, nvidia acceleration), use recommended quantization based on comments in ggml-org/llama.cpp#5780

Expose the prompt through a flask webhook interface accept images and text in prompts and provide a response via json

Prepare a schema for the response format

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: wpowiertowski <671688+wpowiertowski@users.noreply.github.com>

wpowiertowski

Schema file should be a json representing the expected response format but looks like you implemented some model logic there

- Add response_schema.json with formal JSON schema definitions - Rename schema.py to models.py to clarify it contains internal validation logic - Update README to reference the JSON schema file - Update documentation with JSON examples instead of Python type hints Co-authored-by: wpowiertowski <671688+wpowiertowski@users.noreply.github.com>

Copilot · 2026-01-13T21:50:51Z

Schema file should be a json representing the expected response format but looks like you implemented some model logic there

Fixed in b49c532. Created response_schema.json with formal JSON Schema definitions and renamed the Python file to models.py to clarify it's for internal validation logic, not the schema definition.

- Use Literal types for error_type and status fields - Ensures runtime validation matches JSON schema constraints - Update docstring to reference response_schema.json Co-authored-by: wpowiertowski <671688+wpowiertowski@users.noreply.github.com>

Initial plan

f5744e7

Copilot AI assigned Copilot and wpowiertowski Jan 13, 2026

Copilot started work on behalf of wpowiertowski January 13, 2026 21:27 View session

Copilot AI and others added 4 commits January 13, 2026 21:31

Add Llama 3.2 Vision docker setup with Flask webhook API

7aff62a

Co-authored-by: wpowiertowski <671688+wpowiertowski@users.noreply.github.com>

Add CLIP model support for vision features and update documentation

620e2ff

Co-authored-by: wpowiertowski <671688+wpowiertowski@users.noreply.github.com>

Fix security issues, resource handling, and dependency versions

556c1b9

Co-authored-by: wpowiertowski <671688+wpowiertowski@users.noreply.github.com>

Remove unused import and update docker-compose format

dbe653c

Co-authored-by: wpowiertowski <671688+wpowiertowski@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Setup Docker image for Llama 3.2 vision with Flask webhook~~ Add Llama 3.2 Vision inference service with CPU-only Q4_K_M quantization Jan 13, 2026

Copilot AI requested a review from wpowiertowski January 13, 2026 21:42

Copilot finished work on behalf of wpowiertowski January 13, 2026 21:42

wpowiertowski requested changes Jan 13, 2026

View reviewed changes

Copilot started work on behalf of wpowiertowski January 13, 2026 21:48 View session

Copilot finished work on behalf of wpowiertowski January 13, 2026 21:53

Copilot AI requested a review from wpowiertowski January 13, 2026 21:53

wpowiertowski approved these changes Jan 13, 2026

View reviewed changes

wpowiertowski marked this pull request as ready for review January 13, 2026 22:21

wpowiertowski merged commit 8105e3f into main Jan 13, 2026

wpowiertowski deleted the copilot/setup-docker-image-llama-vision branch January 17, 2026 01:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Llama 3.2 Vision inference service with CPU-only Q4_K_M quantization #12

Add Llama 3.2 Vision inference service with CPU-only Q4_K_M quantization #12

Uh oh!

Copilot AI commented Jan 13, 2026 •

edited

Loading

Uh oh!

wpowiertowski left a comment

Uh oh!

Copilot AI commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Llama 3.2 Vision inference service with CPU-only Q4_K_M quantization #12

Add Llama 3.2 Vision inference service with CPU-only Q4_K_M quantization #12

Uh oh!

Conversation

Copilot AI commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Plan: Llama 3.2 Vision Docker with Flask Webhook

Uh oh!

wpowiertowski left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 13, 2026 •

edited

Loading