-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Make numpy and pandas optional for ~7 times smaller deps
#153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 8 commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
9e83480
Make `numpy` and `pandas` optional dependencies
jkbrzt acd8b93
Cleanup
jkbrzt 658a4ca
Cleanup
jkbrzt 41fb5d2
Cleanup
jkbrzt 49941e4
Cleanup
jkbrzt 69a42c6
Cleanup
jkbrzt 8bd45b2
Cleanup
jkbrzt 184248c
Move `openpyxl` to `datalib` extras
jkbrzt 1d4a5af
Merge branch 'main' into jakub/data-libraries-optional
jkbrzt cbe9446
Improve errors and instructions
jkbrzt 054f9b4
Add “Optional dependencies” to README
jkbrzt 129e6ba
Merge remote-tracking branch 'origin/jakub/data-libraries-optional' i…
jkbrzt 1ffae5d
Polish README.md
jkbrzt be99210
Polish README.md
jkbrzt 4721f67
Merge branch 'main' into jakub/data-libraries-optional
hallacy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| """ | ||
| This module helps make data libraries like `numpy` and `pandas` optional dependencies. | ||
|
|
||
| The libraries add up to 130MB+, which makes it challenging to deploy applications | ||
| using this library in environments with code size constraints, like AWS Lambda. | ||
|
|
||
| This module serves as an import proxy and provides a few utilities for dealing with the optionality. | ||
|
|
||
| Since the primary use case of this library (talking to the OpenAI API) doesn’t generally require data libraries, | ||
| it’s safe to make them optional. The rare case when data libraries are needed in the client is handled through | ||
| assertions with instructive error messages. | ||
|
|
||
| See also `setup.py`. | ||
|
|
||
| """ | ||
| try: | ||
| import numpy | ||
| except ImportError: | ||
| numpy = None | ||
|
|
||
| try: | ||
| import pandas | ||
| except ImportError: | ||
| pandas = None | ||
|
|
||
| HAS_NUMPY = bool(numpy) | ||
| HAS_PANDAS = bool(pandas) | ||
|
|
||
| NUMPY_INSTRUCTIONS = "numpy is not installed: pip install openai[datalib]" | ||
| PANDAS_INSTRUCTIONS = "pandas is not installed: pip install openai[datalib]" | ||
|
|
||
|
|
||
| def assert_has_numpy(): | ||
| if not HAS_NUMPY: | ||
| raise Exception(NUMPY_INSTRUCTIONS) | ||
|
|
||
|
|
||
| def assert_has_pandas(): | ||
| if not HAS_PANDAS: | ||
| raise Exception(PANDAS_INSTRUCTIONS) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should call
assert_has_numpyandassert_has_pandasin each function these modules are used so that it's very clear to users what to do to fix the issue (rather than getting a generic'NoneType' object has no attributePython exception).Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
embeddings_utis.pyfile is not imported from anywhere and it’s the only module that importssklearnand other libraries listed in theopenai[embeddings]extra. I couldn’t find any docs, but its usage impliespip install openai[embeddings](which now also ensures numpy/pandas/etc.), so the experience of usingembeddings_utils.pyshould be unchanged.https://github.com/jakubroztocil/openai-python/blob/jakub/data-libraries-optional/setup.py#L46-L53
It could be improved, though. I think each optional extra —
embeddings,wandb, and the newdatalib— would deserve mention in the README. I’ll add a section on the new one, and if you can give me some context on the other two, I’ll be happy to mention them too.I wasn't sure whether you’d be interested in the PR, but it looks like you are, so I’ll polish it a bit: I’m thinking maybe throwing an
ImportErrorinstead of justExceptionfrom theassert_has_*functions, ensuring the error messages are clear, etc.It’s to a degree a backward-incompatible change (for existing users who don’t install
openai[embeddings]and hit this line or useread_any_format()via the CLI ), so it might also be worth bumping the major version.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you're right this is an embeddings file so it will have the right dependencies.
Regarding the backward-incompatibility, yes it's unfortunate but personally I think it's probably ok as long as the error is clear and explains how to resolve the problem. Also the line in
read_any_formatis specific to embeddings so it's fine to assume that the embedding deps were installed.See #124 for some historical context too about how deps have been handled too.