Skip to content

Conversation

rchowell
Copy link
Contributor

@rchowell rchowell commented Sep 3, 2025

Changes Made

  • Moves each protocols in each provider to a 'protocols' module
  • Moves each provider to a provider.py
  • We can alway re-export providers if we want.
  • Defines the TextClassifier protocol and descriptor.
  • Refactors the Provider base class to have consistent NotImplementedErrors.
  • Documents how to create a model-backed expression.

Related Issues

Checklist

  • Documented in API Docs (if applicable)
  • Documented in User Guide (if applicable)
  • If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements a comprehensive refactoring of the daft.ai module structure to better organize AI providers and protocols. The changes move from a flat module structure to a hierarchical one where each provider (OpenAI, SentenceTransformers, Transformers) follows a consistent pattern: providers are implemented in dedicated provider.py files, and protocol implementations are organized in protocols/ subdirectories.

Key architectural changes include:

  • Provider reorganization: Each provider (OpenAI, SentenceTransformers, Transformers) now has its implementation in a provider.py file rather than in __init__.py
  • Protocol separation: Protocol implementations like TextEmbedder and ImageEmbedder are moved to dedicated files within protocols/ subdirectories
  • Consistent error handling: The Provider base class now uses concrete methods with consistent NotImplementedError implementations via a new not_implemented_err helper function
  • Import structure updates: All imports across tests and implementation files are updated to reference the new module locations

The refactoring maintains full backward compatibility while establishing a cleaner foundation for future AI capabilities. The PR also adds comprehensive documentation for implementing model-backed expressions and updates the API documentation to reflect the new structure. This organizational pattern makes the codebase more modular, maintainable, and ready for extensibility with new protocols like the planned TextClassifier.

Confidence score: 3/5

  • This PR contains significant structural changes that require careful review due to extensive file moves and import path updates across the entire AI module
  • Score reflects the complexity of the refactoring and potential for import-related issues, though the changes follow consistent patterns and maintain functionality
  • Pay close attention to docs/models/index.md which has structural issues including missing Step 2 and inconsistent protocol examples that need correction

22 files reviewed, 2 comments

Edit Code Review Bot Settings | Greptile

Comment on lines +25 to +27
def embed_text(self, text: list[str], labels: LabelLike | list[LabelLike]) -> list[Embedding]:
"""Classifies a batch of text strings using the given label(s)."""
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Method name embed_text doesn't match the TextClassifier protocol purpose. Should be classify_text to match the docstring and protocol intent.

Suggested change
def embed_text(self, text: list[str], labels: LabelLike | list[LabelLike]) -> list[Embedding]:
"""Classifies a batch of text strings using the given label(s)."""
...
def classify_text(self, text: list[str], labels: LabelLike | list[LabelLike]) -> list[Embedding]:
"""Classifies a batch of text strings using the given label(s)."""
...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return type should change too

"""Descriptor for a TextClassifier implementation."""
```

### Step 3. Add to the Provider Interface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Missing Step 2 - document jumps from Step 1 to Step 3.

Suggested change
### Step 3. Add to the Provider Interface
### Step 2. Add to the Provider Interface

Comment on lines +25 to +27
def embed_text(self, text: list[str], labels: LabelLike | list[LabelLike]) -> list[Embedding]:
"""Classifies a batch of text strings using the given label(s)."""
...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return type should change too

@rchowell rchowell force-pushed the rchowell/provider_modules branch from 17f8ff6 to 623155d Compare September 3, 2025 21:29
@rchowell rchowell merged commit 3adbdba into main Sep 3, 2025
29 checks passed
@rchowell rchowell deleted the rchowell/provider_modules branch September 3, 2025 22:47
venkateshdb pushed a commit to venkateshdb/Daft that referenced this pull request Sep 6, 2025
…ventual-Inc#5125)

## Changes Made

- Moves each protocols in each provider to a 'protocols' module
- Moves each provider to a `provider.py`
- We can alway re-export providers if we want.
- Defines the TextClassifier protocol and descriptor.
- Refactors the Provider base class to have consistent
NotImplementedErrors.
- Documents how to create a model-backed expression.

## Related Issues

- Eventual-Inc#5113 

## Checklist

- [x] Documented in API Docs (if applicable)
- [x] Documented in User Guide (if applicable)
- [x] If adding a new documentation page, doc is added to
`docs/mkdocs.yml` navigation
- [x] Documentation builds and is formatted properly (tag @/ccmao1130
for docs review)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants