feat: Implement embed_image() #5101

desmondcheongzx · 2025-08-31T21:13:17Z

Changes Made

Adds support for embed_image(), e.g.

import daft
from daft.functions.ai import embed_image

import numpy as np

provider = "transformers"
model = "openai/clip-vit-base-patch32"
test_image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)

(
    daft.from_pydict({"image": [test_image] * 16})
    .select(daft.col("image").cast(daft.DataType.image()))
    .select(embed_image(daft.col("image"), provider=provider, model=model))
    .show()
)

!! Currently only supports OpenAI CLIP models: https://huggingface.co/docs/transformers/en/model_doc/clip

greptile-apps

Greptile Summary

This PR implements comprehensive image embedding functionality for the Daft framework, extending the existing text-only AI capabilities to support multimodal operations. The implementation follows the established provider architecture pattern used by text embeddings, adding a new embed_image() function that works with transformer-based models like CLIP.

The changes introduce several key components:

Core Image Embedding Infrastructure: A new ImageEmbedder protocol and ImageEmbedderDescriptor abstract class in protocols.py that mirrors the existing text embedding pattern, providing a standardized interface for image embedding implementations.
Transformers Provider Support: A complete TransformersImageEmbedder implementation that leverages HuggingFace transformers models with automatic device selection (CUDA/MPS/CPU), proper PIL image conversion, and batch processing capabilities. The implementation defaults to the 'openai/clip-vit-base-patch32' model.
Provider Architecture Extensions: Updates to the provider system including a new load_transformers() function, addition of transformers to the PROVIDERS registry, and implementation of the abstract get_image_embedder() method across all providers. Non-supporting providers (OpenAI and SentenceTransformers) properly raise NotImplementedError with descriptive messages.
Expression Layer: An _ImageEmbedderExpression class that integrates with Daft's UDF system, enabling the image embedding functionality to work seamlessly within dataframe operations.
Public API: The main embed_image() function in daft.functions.ai that provides users with a simple interface for image embedding operations, following the same pattern as the existing embed_text() function.

The implementation maintains API consistency across the framework while extending capabilities to support computer vision tasks. All changes follow the established patterns for dependency management, error handling, and provider resolution that users are already familiar with from text embedding operations.

Confidence score: 4/5

This PR introduces significant new functionality but follows well-established patterns from the existing text embedding implementation
Score reflects the complexity of the multimodal AI integration and potential for device/dependency-related issues
Pay close attention to the transformers image embedder implementation and provider registration changes

_{9 files reviewed, 5 comments}

_{Edit Code Review Bot Settings | Greptile}

daft/ai/provider.py

daft/ai/transformers/image_embedder.py

daft/ai/_expressions.py

tests/ai/test_transformers.py

greptile-apps · 2025-08-31T21:15:11Z

tests/ai/test_transformers.py

+        with pytest.raises(ImportError, match="Pillow is required for image processing but not available"):
+            test_image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)
+            (
+                daft.from_pydict({"image": [test_image]})
+                .select(daft.col("image").cast(daft.DataType.image()))
+                .select(embed_image(daft.col("image")))
+            )


logic: Test creates dataframe but doesn't execute it - the lazy evaluation means PIL check only happens on execution

Suggested change

with pytest.raises(ImportError, match="Pillow is required for image processing but not available"):

test_image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)

(

daft.from_pydict({"image": [test_image]})

.select(daft.col("image").cast(daft.DataType.image()))

.select(embed_image(daft.col("image")))

)

with pytest.raises(ImportError, match="Pillow is required for image processing but not available"):

test_image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)

(

daft.from_pydict({"image": [test_image]})

.select(daft.col("image").cast(daft.DataType.image()))

.select(embed_image(daft.col("image")))

.collect()

)

Incorrect. I specifically want to check that PIL image check happens without needing to execute the query.

codecov · 2025-09-01T00:39:14Z

Codecov Report

❌ Patch coverage is 87.36842% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.54%. Comparing base (ae638e5) to head (9a6368c).
⚠️ Report is 31 commits behind head on main.

Files with missing lines	Patch %	Lines
daft/ai/_expressions.py	57.14%	3 Missing ⚠️
daft/functions/ai/__init__.py	57.14%	3 Missing ⚠️
daft/ai/provider.py	75.00%	2 Missing ⚠️
daft/ai/transformers/__init__.py	90.47%	2 Missing ⚠️
daft/ai/openai/__init__.py	50.00%	1 Missing ⚠️
daft/ai/sentence_transformers/__init__.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5101      +/-   ##
==========================================
+ Coverage   75.27%   76.54%   +1.26%     
==========================================
  Files         949      952       +3     
  Lines      132520   130581    -1939     
==========================================
+ Hits        99761    99948     +187     
+ Misses      32759    30633    -2126

Files with missing lines	Coverage Δ
daft/ai/protocols.py	`100.00% <100.00%> (ø)`
daft/ai/sentence_transformers/text_embedder.py	`100.00% <100.00%> (+100.00%)`	⬆️
daft/ai/transformers/image_embedder.py	`100.00% <100.00%> (ø)`
daft/ai/openai/__init__.py	`94.73% <50.00%> (-5.27%)`	⬇️
daft/ai/sentence_transformers/__init__.py	`88.88% <50.00%> (+88.88%)`	⬆️
daft/ai/provider.py	`69.23% <75.00%> (+1.48%)`	⬆️
daft/ai/transformers/__init__.py	`90.47% <90.47%> (ø)`
daft/ai/_expressions.py	`62.50% <57.14%> (+62.50%)`	⬆️
daft/functions/ai/__init__.py	`56.66% <57.14%> (+21.88%)`	⬆️

... and 30 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

daft/ai/protocols.py

rchowell · 2025-09-02T15:46:53Z

daft/ai/transformers/__init__.py

+    def get_image_embedder(self, model: str | None = None, **options: Any) -> ImageEmbedderDescriptor:
+        # Raise an error early if PIL is not available.
+        if not pil_image.module_available():
+            raise ImportError("Pillow is required for image processing but not available")
+
+        return TransformersImageEmbedderDescriptor(model or "openai/clip-vit-base-patch32", options)


You should just depend on PIL directly in the module, and the ProviderImportError within load_transformers will handle the appropriate error messaging. PIL will need to be added to that dep list — I will leave a comment there as well.

daft/ai/provider.py

daft/ai/transformers/image_embedder.py

rchowell · 2025-09-02T17:07:03Z

daft/ai/typing.py

+    Image: TypeAlias = np.ndarray[Any, Any]
 else:
    Embedding: TypeAlias = Any
+    Image: TypeAlias = Any


Nice, thanks. A little 'eh, why?' right now — but you can see how once the typing work catches up, how daft will have its own "Image" and "Embedding" type which is np.ndarray compatible via https://numpy.org/devdocs/reference/arrays.interface.html

daft/ai/utils.py

Co-authored-by: R. C. Howell <[email protected]>

## Changes Made Adds support for `embed_image()`, e.g. ``` import daft from daft.functions.ai import embed_image import numpy as np provider = "transformers" model = "openai/clip-vit-base-patch32" test_image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8) ( daft.from_pydict({"image": [test_image] * 16}) .select(daft.col("image").cast(daft.DataType.image())) .select(embed_image(daft.col("image"), provider=provider, model=model)) .show() ) ``` **!! Currently only supports OpenAI CLIP models: https://huggingface.co/docs/transformers/en/model_doc/clip** --------- Co-authored-by: R. C. Howell <[email protected]>

desmondcheongzx added 2 commits August 31, 2025 13:44

impl embed_image

cfdd91b

add tests

d858ca0

github-actions bot added the feat label Aug 31, 2025

greptile-apps bot reviewed Aug 31, 2025

View reviewed changes

desmondcheongzx added 4 commits August 31, 2025 14:20

address

7b2a06f

fix mypy

7bae303

ease up on ci

a896aee

fix ci

e50ecd3

desmondcheongzx requested a review from ccmao1130 as a code owner August 31, 2025 23:08

use safetensors

9a6368c

desmondcheongzx requested a review from rchowell September 2, 2025 15:02

rchowell reviewed Sep 2, 2025

View reviewed changes

address comments

546126a

desmondcheongzx requested a review from rchowell September 2, 2025 16:25

rchowell reviewed Sep 2, 2025

View reviewed changes

oops

4ff1e24

rchowell approved these changes Sep 2, 2025

View reviewed changes

daft/ai/utils.py Outdated Show resolved Hide resolved

desmondcheongzx and others added 2 commits September 2, 2025 10:34

Update daft/ai/utils.py

0320345

Co-authored-by: R. C. Howell <[email protected]>

style

fb7a5ea

desmondcheongzx enabled auto-merge (squash) September 2, 2025 17:51

desmondcheongzx merged commit e4e52f8 into main Sep 2, 2025
53 of 54 checks passed

desmondcheongzx deleted the desmond/embed_image branch September 2, 2025 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Implement embed_image() #5101

feat: Implement embed_image() #5101

Uh oh!

desmondcheongzx commented Aug 31, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Aug 31, 2025

Uh oh!

desmondcheongzx Aug 31, 2025

Uh oh!

codecov bot commented Sep 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

rchowell Sep 2, 2025

Uh oh!

desmondcheongzx Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

rchowell Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: Implement embed_image() #5101

feat: Implement embed_image() #5101

Uh oh!

Conversation

desmondcheongzx commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

desmondcheongzx Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

rchowell Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

desmondcheongzx Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rchowell Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

desmondcheongzx commented Aug 31, 2025 •

edited

Loading

codecov bot commented Sep 1, 2025 •

edited

Loading