Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add drive scope to GCP Auth #23569

Closed

Conversation

ricardograca-scratch
Copy link

@ricardograca-scratch ricardograca-scratch commented Oct 11, 2022

This adds the ability to access Google Drive backed BigQuery tables in the Python and Java SDKs. Fixes #23290.

Note: I couldn't find the GCP Auth in the TypeScript and Go SDKs.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@ricardograca-scratch ricardograca-scratch marked this pull request as ready for review October 11, 2022 10:20
@ricardograca-scratch
Copy link
Author

ricardograca-scratch commented Oct 11, 2022

R: @kennknowles Can you review this please?

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @yeandy for label python.
R: @lukecwik for label java.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@codecov
Copy link

codecov bot commented Oct 11, 2022

Codecov Report

Merging #23569 (45cc085) into master (d2b0a26) will decrease coverage by 0.13%.
The diff coverage is 93.91%.

❗ Current head 45cc085 differs from pull request most recent head 72d3dd2. Consider uploading reports for the commit 72d3dd2 to get more accurate results

@@            Coverage Diff             @@
##           master   #23569      +/-   ##
==========================================
- Coverage   73.46%   73.33%   -0.14%     
==========================================
  Files         718      719       +1     
  Lines       95935    95795     -140     
==========================================
- Hits        70477    70248     -229     
- Misses      24147    24236      +89     
  Partials     1311     1311              
Flag Coverage Δ
go 50.88% <94.73%> (+0.01%) ⬆️
python 83.04% <93.79%> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/io/gcp/bigquery_tools.py 73.30% <ø> (-12.43%) ⬇️
...ache_beam/runners/dataflow/ptransform_overrides.py 90.90% <ø> (+0.78%) ⬆️
sdks/python/apache_beam/io/gcp/bigquery.py 71.31% <60.00%> (-2.94%) ⬇️
...on/apache_beam/runners/dataflow/dataflow_runner.py 80.88% <75.00%> (-2.03%) ⬇️
sdks/go/pkg/beam/core/runtime/harness/harness.go 11.11% <94.73%> (+1.11%) ⬆️
...s/python/apache_beam/utils/multi_process_shared.py 97.36% <97.36%> (ø)
...hon/apache_beam/runners/dataflow/internal/names.py 100.00% <100.00%> (ø)
sdks/python/apache_beam/internal/gcp/json_value.py 85.50% <0.00%> (-2.90%) ⬇️
... and 9 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kennknowles
Copy link
Member

R: @tvalentyn

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@lukecwik
Copy link
Member

I think we want to minimize the number of scopes that are requested on behalf of a user. Users can already set their own credential factory in Beam Java and configure it to request any scopes when creating the credential object. We could use something similar for Beam Python.

For a simple improvement, we could make a pipeline option that each user can configure with the existing scopes being the default?

@tvalentyn
Copy link
Contributor

tvalentyn commented Oct 13, 2022

For a simple improvement, we could make a pipeline option that each user can configure with the existing scopes being the default?

FWIW this is straightforward. pipeline_options are available in both places CLIENT_SCOPES is currently used. we can add, for example an option --custom_oauth_scopes , that will store semicolon-separated strings here:

We can add the contents of that option the the default scopes currently hardcoded.

Note: it is sufficient to do this just for Python, Java already has a capability to create and supply custom credentials.

@lukecwik
Copy link
Member

For a simple improvement, we could make a pipeline option that each user can configure with the existing scopes being the default?

FWIW this is straightforward. pipeline_options are available in both places CLIENT_SCOPES is currently used. we can add, for example an option --custom_oauth_scopes , that will store semicolon-separated strings here:

We can add the contents of that option the the default scopes currently hardcoded.

Note: it is sufficient to do this just for Python, Java already has a capability to create and supply custom credentials.

Can we use a list type instead of using semi-colons as pipeline_options.py supports lists already?

Having an equivalent in Java is convenient so that the user experience is the same. This will also help with xlang so that those additional scopes can be specified for IO reasons.

Also suggest --oauth_scopes instead of --custom_oauth_scopes

@ricardograca-scratch
Copy link
Author

@lukecwik @tvalentyn I can certainly add the parser.add_argument() part, I think, but how do I specify what kind of value it accepts? And do I then just merge that value with CLIENT_SCOPES, or is there a more involved process that's required?

@tvalentyn
Copy link
Contributor

tvalentyn commented Oct 14, 2022

@lukecwik good point, thanks.
@ricardograca-scratch you can mirror the implementation of dataflow_service_options. See:

parser.add_argument(
'--dataflow_service_option',
'--dataflow_service_options',
dest='dataflow_service_options',
action='append',
default=None,
help=(
'Options to configure the Dataflow service. These '
'options decouple service side feature availbility '
'from the Apache Beam release cycle.'
'Note: If set programmatically, must be set as a '
'list of strings'))

if self.dataflow_service_options:

def test_dataflow_service_options(self):

a repeatable option : --oauth_scopes + --oauth_scope (alias) sounds good.

you can then combine the content with the CLIENT_SCOPES wherever scopes are consumed. CLIENT_SCOPES can be renamed into REQUIRED_OAUTH_SCOPES to disambiguate required + additional.

@lukecwik
Copy link
Member

This was superseded by #23644, feel free to reopen PR if you want to continue with this change.

@lukecwik lukecwik closed this Oct 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: Add Google Drive scope to GCP Auth
4 participants