Skip to content

Conversation

desmondcheongzx
Copy link
Collaborator

@desmondcheongzx desmondcheongzx commented Aug 31, 2025

Changes Made

Adds support for embed_image(), e.g.

import daft
from daft.functions.ai import embed_image

import numpy as np

provider = "transformers"
model = "openai/clip-vit-base-patch32"
test_image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)

(
    daft.from_pydict({"image": [test_image] * 16})
    .select(daft.col("image").cast(daft.DataType.image()))
    .select(embed_image(daft.col("image"), provider=provider, model=model))
    .show()
)

!! Currently only supports OpenAI CLIP models: https://huggingface.co/docs/transformers/en/model_doc/clip

@github-actions github-actions bot added the feat label Aug 31, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements comprehensive image embedding functionality for the Daft framework, extending the existing text-only AI capabilities to support multimodal operations. The implementation follows the established provider architecture pattern used by text embeddings, adding a new embed_image() function that works with transformer-based models like CLIP.

The changes introduce several key components:

  1. Core Image Embedding Infrastructure: A new ImageEmbedder protocol and ImageEmbedderDescriptor abstract class in protocols.py that mirrors the existing text embedding pattern, providing a standardized interface for image embedding implementations.

  2. Transformers Provider Support: A complete TransformersImageEmbedder implementation that leverages HuggingFace transformers models with automatic device selection (CUDA/MPS/CPU), proper PIL image conversion, and batch processing capabilities. The implementation defaults to the 'openai/clip-vit-base-patch32' model.

  3. Provider Architecture Extensions: Updates to the provider system including a new load_transformers() function, addition of transformers to the PROVIDERS registry, and implementation of the abstract get_image_embedder() method across all providers. Non-supporting providers (OpenAI and SentenceTransformers) properly raise NotImplementedError with descriptive messages.

  4. Expression Layer: An _ImageEmbedderExpression class that integrates with Daft's UDF system, enabling the image embedding functionality to work seamlessly within dataframe operations.

  5. Public API: The main embed_image() function in daft.functions.ai that provides users with a simple interface for image embedding operations, following the same pattern as the existing embed_text() function.

The implementation maintains API consistency across the framework while extending capabilities to support computer vision tasks. All changes follow the established patterns for dependency management, error handling, and provider resolution that users are already familiar with from text embedding operations.

Confidence score: 4/5

  • This PR introduces significant new functionality but follows well-established patterns from the existing text embedding implementation
  • Score reflects the complexity of the multimodal AI integration and potential for device/dependency-related issues
  • Pay close attention to the transformers image embedder implementation and provider registration changes

9 files reviewed, 5 comments

Edit Code Review Bot Settings | Greptile

Comment on lines 103 to 109
with pytest.raises(ImportError, match="Pillow is required for image processing but not available"):
test_image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)
(
daft.from_pydict({"image": [test_image]})
.select(daft.col("image").cast(daft.DataType.image()))
.select(embed_image(daft.col("image")))
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Test creates dataframe but doesn't execute it - the lazy evaluation means PIL check only happens on execution

Suggested change
with pytest.raises(ImportError, match="Pillow is required for image processing but not available"):
test_image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)
(
daft.from_pydict({"image": [test_image]})
.select(daft.col("image").cast(daft.DataType.image()))
.select(embed_image(daft.col("image")))
)
with pytest.raises(ImportError, match="Pillow is required for image processing but not available"):
test_image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)
(
daft.from_pydict({"image": [test_image]})
.select(daft.col("image").cast(daft.DataType.image()))
.select(embed_image(daft.col("image")))
.collect()
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect. I specifically want to check that PIL image check happens without needing to execute the query.

Copy link

codecov bot commented Sep 1, 2025

Codecov Report

❌ Patch coverage is 87.36842% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.54%. Comparing base (ae638e5) to head (9a6368c).
⚠️ Report is 31 commits behind head on main.

Files with missing lines Patch % Lines
daft/ai/_expressions.py 57.14% 3 Missing ⚠️
daft/functions/ai/__init__.py 57.14% 3 Missing ⚠️
daft/ai/provider.py 75.00% 2 Missing ⚠️
daft/ai/transformers/__init__.py 90.47% 2 Missing ⚠️
daft/ai/openai/__init__.py 50.00% 1 Missing ⚠️
daft/ai/sentence_transformers/__init__.py 50.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5101      +/-   ##
==========================================
+ Coverage   75.27%   76.54%   +1.26%     
==========================================
  Files         949      952       +3     
  Lines      132520   130581    -1939     
==========================================
+ Hits        99761    99948     +187     
+ Misses      32759    30633    -2126     
Files with missing lines Coverage Δ
daft/ai/protocols.py 100.00% <100.00%> (ø)
daft/ai/sentence_transformers/text_embedder.py 100.00% <100.00%> (+100.00%) ⬆️
daft/ai/transformers/image_embedder.py 100.00% <100.00%> (ø)
daft/ai/openai/__init__.py 94.73% <50.00%> (-5.27%) ⬇️
daft/ai/sentence_transformers/__init__.py 88.88% <50.00%> (+88.88%) ⬆️
daft/ai/provider.py 69.23% <75.00%> (+1.48%) ⬆️
daft/ai/transformers/__init__.py 90.47% <90.47%> (ø)
daft/ai/_expressions.py 62.50% <57.14%> (+62.50%) ⬆️
daft/functions/ai/__init__.py 56.66% <57.14%> (+21.88%) ⬆️

... and 30 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines 30 to 35
def get_image_embedder(self, model: str | None = None, **options: Any) -> ImageEmbedderDescriptor:
# Raise an error early if PIL is not available.
if not pil_image.module_available():
raise ImportError("Pillow is required for image processing but not available")

return TransformersImageEmbedderDescriptor(model or "openai/clip-vit-base-patch32", options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should just depend on PIL directly in the module, and the ProviderImportError within load_transformers will handle the appropriate error messaging. PIL will need to be added to that dep list — I will leave a comment there as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah perfect

Comment on lines +51 to +54
Image: TypeAlias = np.ndarray[Any, Any]
else:
Embedding: TypeAlias = Any
Image: TypeAlias = Any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks. A little 'eh, why?' right now — but you can see how once the typing work catches up, how daft will have its own "Image" and "Embedding" type which is np.ndarray compatible via https://numpy.org/devdocs/reference/arrays.interface.html

@desmondcheongzx desmondcheongzx enabled auto-merge (squash) September 2, 2025 17:51
@desmondcheongzx desmondcheongzx merged commit e4e52f8 into main Sep 2, 2025
53 of 54 checks passed
@desmondcheongzx desmondcheongzx deleted the desmond/embed_image branch September 2, 2025 18:26
venkateshdb pushed a commit to venkateshdb/Daft that referenced this pull request Sep 6, 2025
## Changes Made

Adds support for `embed_image()`, e.g.

```
import daft
from daft.functions.ai import embed_image

import numpy as np

provider = "transformers"
model = "openai/clip-vit-base-patch32"
test_image = np.random.randint(0, 255, (100, 100, 3), dtype=np.uint8)

(
    daft.from_pydict({"image": [test_image] * 16})
    .select(daft.col("image").cast(daft.DataType.image()))
    .select(embed_image(daft.col("image"), provider=provider, model=model))
    .show()
)
```

**!! Currently only supports OpenAI CLIP models:
https://huggingface.co/docs/transformers/en/model_doc/clip**

---------

Co-authored-by: R. C. Howell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants