-
Notifications
You must be signed in to change notification settings - Fork 295
refactor: proper organization for providers and protocols in daft.ai #5125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR implements a comprehensive refactoring of the daft.ai module structure to better organize AI providers and protocols. The changes move from a flat module structure to a hierarchical one where each provider (OpenAI, SentenceTransformers, Transformers) follows a consistent pattern: providers are implemented in dedicated provider.py
files, and protocol implementations are organized in protocols/
subdirectories.
Key architectural changes include:
- Provider reorganization: Each provider (OpenAI, SentenceTransformers, Transformers) now has its implementation in a
provider.py
file rather than in__init__.py
- Protocol separation: Protocol implementations like
TextEmbedder
andImageEmbedder
are moved to dedicated files withinprotocols/
subdirectories - Consistent error handling: The
Provider
base class now uses concrete methods with consistentNotImplementedError
implementations via a newnot_implemented_err
helper function - Import structure updates: All imports across tests and implementation files are updated to reference the new module locations
The refactoring maintains full backward compatibility while establishing a cleaner foundation for future AI capabilities. The PR also adds comprehensive documentation for implementing model-backed expressions and updates the API documentation to reflect the new structure. This organizational pattern makes the codebase more modular, maintainable, and ready for extensibility with new protocols like the planned TextClassifier.
Confidence score: 3/5
- This PR contains significant structural changes that require careful review due to extensive file moves and import path updates across the entire AI module
- Score reflects the complexity of the refactoring and potential for import-related issues, though the changes follow consistent patterns and maintain functionality
- Pay close attention to
docs/models/index.md
which has structural issues including missing Step 2 and inconsistent protocol examples that need correction
22 files reviewed, 2 comments
def embed_text(self, text: list[str], labels: LabelLike | list[LabelLike]) -> list[Embedding]: | ||
"""Classifies a batch of text strings using the given label(s).""" | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Method name embed_text
doesn't match the TextClassifier protocol purpose. Should be classify_text
to match the docstring and protocol intent.
def embed_text(self, text: list[str], labels: LabelLike | list[LabelLike]) -> list[Embedding]: | |
"""Classifies a batch of text strings using the given label(s).""" | |
... | |
def classify_text(self, text: list[str], labels: LabelLike | list[LabelLike]) -> list[Embedding]: | |
"""Classifies a batch of text strings using the given label(s).""" | |
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return type should change too
docs/models/index.md
Outdated
"""Descriptor for a TextClassifier implementation.""" | ||
``` | ||
|
||
### Step 3. Add to the Provider Interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Missing Step 2 - document jumps from Step 1 to Step 3.
### Step 3. Add to the Provider Interface | |
### Step 2. Add to the Provider Interface |
def embed_text(self, text: list[str], labels: LabelLike | list[LabelLike]) -> list[Embedding]: | ||
"""Classifies a batch of text strings using the given label(s).""" | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return type should change too
17f8ff6
to
623155d
Compare
…ventual-Inc#5125) ## Changes Made - Moves each protocols in each provider to a 'protocols' module - Moves each provider to a `provider.py` - We can alway re-export providers if we want. - Defines the TextClassifier protocol and descriptor. - Refactors the Provider base class to have consistent NotImplementedErrors. - Documents how to create a model-backed expression. ## Related Issues - Eventual-Inc#5113 ## Checklist - [x] Documented in API Docs (if applicable) - [x] Documented in User Guide (if applicable) - [x] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [x] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)
Changes Made
provider.py
Related Issues
Checklist
docs/mkdocs.yml
navigation