Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove pyarrow from setup.py #470

Closed
wants to merge 1 commit into from
Closed

Conversation

bnaul
Copy link
Contributor

@bnaul bnaul commented Jul 11, 2022

#414 added pyarrow to the setup.py as part of adding google-bigquery-storage to install_requires. I understand the motivation for adding google-bigquery-storage, but since that has its own allowed pyarrow version range, it seems simpler to just rely on that rather than specifying yet another pyarrow range here which can get (/ currently is) out of sync with that library.

@tswast what do you think?

@bnaul bnaul requested a review from a team as a code owner July 11, 2022 17:57
@bnaul bnaul requested review from a team and shollyman July 11, 2022 17:57
@product-auto-label product-auto-label bot added size: xs Pull request size is extra small. api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. labels Jul 11, 2022
@tswast
Copy link
Collaborator

tswast commented Jul 11, 2022

If we were to remove it, I think we'd need to make sure we specify the pyarrow extras https://github.com/googleapis/python-bigquery-storage/blob/55d7c5656909b177fe198bd0dc61757f30f5b7a0/setup.py#L34 in the google-cloud-bigquery-storage dependency

"google-cloud-bigquery-storage>=2.0.0,<3.0.0dev",

Alternatively, there may be a google-cloud-bigquery "extra" that'll get the DB-API features we need.

@tswast
Copy link
Collaborator

tswast commented Jul 11, 2022

Related: googleapis/python-bigquery#1196 is an open issue for a request to make pyarrow optional again in google-cloud-bigquery

@bnaul
Copy link
Contributor Author

bnaul commented Jul 14, 2022

If we were to remove it, I think we'd need to make sure we specify the pyarrow extras https://github.com/googleapis/python-bigquery-storage/blob/55d7c5656909b177fe198bd0dc61757f30f5b7a0/setup.py#L34 in the google-cloud-bigquery-storage dependency

Why is that exactly? I don't see any explicit uses of pyarrow here, my thinking was that if other libraries that we depend on here need it then it'll get pulled in during the setup for those libraries. But for example this seems like another good motivation for removing:

Related: googleapis/python-bigquery#1196 is an open issue for a request to make pyarrow optional again in google-cloud-bigquery

If this were made optional, right now there'd need to be a separate PR (like this one) to remove that dependency here. But if you just defer to google-cloud-bigquery on what is required then you don't have to worry about it. Does that make sense or am I missing something about how this library relies on pyarrow?

@shollyman shollyman requested review from tswast, aribray, chalmerlowe and loferris and removed request for shollyman July 22, 2022 23:18
@tswast
Copy link
Collaborator

tswast commented Dec 12, 2022

Does that make sense or am I missing something about how this library relies on pyarrow?

The DB-API package that this depends on can optionally rely on the BigQuery Storage API, which requires pyarrow to deserialize the results. This can be much faster for downloading large results, but won't make much of a difference for smaller result sizes.

@tswast
Copy link
Collaborator

tswast commented Dec 12, 2022

If we were to drop pyarrow, we should also drop bqstorage. Ideally, we'd include these as "extras" and make sure we have tests where these packages are not installed & installed.

@chalmerlowe
Copy link
Collaborator

chalmerlowe commented Dec 14, 2022

Related: googleapis/python-bigquery#1196 is an open issue for a request to make pyarrow optional again in google-cloud-bigquery

1196 got closed via this PR:

https://github.com/googleapis/python-bigquery/pull/1282/files

That PR made pyarrow and google-cloud-bigquery-storage optional in python-bigquery

@meredithslota meredithslota assigned chalmerlowe and unassigned tswast Mar 2, 2023
@chalmerlowe chalmerlowe added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Mar 14, 2023
@chalmerlowe
Copy link
Collaborator

@bnaul closing this PR based on some work related to making both pyarrow and bqstorage optional in PR #847.
Thanks so much for issuing this PR and providing impetus to improving python-bigquery-sqlalchemy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. kokoro:force-run Add this label to force Kokoro to re-run the tests. size: xs Pull request size is extra small.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants