-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Add Google Drive scope to GCP Auth #23290
Comments
What happens if Drive API is not enabled for a project but we include it with a scope? |
I imagine the same thing that happens with any other scope, but I'll check and report back. |
@tvalentyn The Drive API being enabled or not has not effect on the result if the extra scope is included. However, without the scope, even if the file is shared with the service account accessing the data this error is displayed and it's not possible to access the table:
Adding the scope resolves the issue. |
Thanks for checking. Would adding |
If not, I would suggest adding a new pipeline option that allows such customizations. Looks like in Java it's possible to specify a custom credential via a factory or credential instance : https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/extensions/gcp/options/GcpOptions.html . In Python we don't have this. But perhaps passing a string of custom scopes to use when authenticating would be sufficient? |
It's already there and it doesn't allow access to Google Drive backed tables, so the drive scope is necessary.
No other scope is treated this way, and it seems like I would need to do a lot of modifications to beam to enable that. I'm no beam expert, not to mention I have no experience developing Java or Go, so the likelihood of me actually pulling off what you propose is very slim, since it would require a substantial amount of time. Can't I just add the necessary scope since it doesn't seem to cause any issues? |
It seems to me that your proposed change is consistent with the current design so we could go ahead as a workaround. But at the same time the current design is odd, since it adds a totally arbitrary set of scopes instead of just the necessary set, so we should fix that. |
I agree with that. The ideal solution for me would be to provide a minimal set that is required to access BigQuery and allow users to specify any extra scopes needed. I'll work on this change as it's currently proposed for now. |
I disagree that adding scopes is a good idea, we should instead allow users to configure the scopes themselves. Adding a credential factory to python or adding a pipeline option that enables configuring these scopes with the default being the currently enabled list makes a lot more sense to me. |
I agree with your comment. I am separating the immediate need with the longer term refactor. Currently, we have an arbitrary set of scopes that probably never should have been done this way. This proposal adds one, keeping the design as-is. Refactoring the design to use only minimal scopes specified by the user, or perhaps by the IOs in the pipeline, is still the best way to go in the future. |
I would disagree, having existing users who upgrade to this version pick up additional scopes that they may not have been aware of is not a good experience. |
Again, I agree with you, but that is status quo. That is exactly what happened when SpannerIO was added. |
If it is easy to do The Right Thing then go ahead and do it. But forcing arbitrary discipline on new contributors that we ourselves never complied with is not how we should run things. |
Continuing to do something that was done incorrectly in the past and digging a bigger hole for future maintenance is a problem. |
It might have been covered by the google-cloud-platform scope. Java doesn't have spanner scopes, not sure if python had to reference them explicitly. |
5b6a0ea#diff-21a78a52eca0c898070d58302127a9bb5cdb5de512ec16b9b3d945e0b84b694c only modifies Python's scopes and not Java's, I think @tvalentyn point still is valid. |
We should be guiding contributors to produce the right solution, this isn't arbitrary. |
#23644 is a proposed solution. |
This allows users to limit scopes dependent on their pipeline. fixes apache#23290
* Make GCP OAuth scopes configurable via pipeline options. This allows users to limit scopes dependent on their pipeline. fixes #23290 * Fix usages of get credentials where no pipeline options was being used to pass in a default instance * Address review comments Drop the non pipelineoptions routes in _add_impersonation_...
Thanks everyone! The final solution looks great, and I wouldn't have minded doing the right thing to help the project, but it would have taken me a long time since I'm not familiar with the project internals. |
I'm happy that you're happy with the solution. Hopefully we can spend more time with you on your next change. |
What would you like to happen?
The current Python SDK doesn't include the
https://www.googleapis.com/auth/drive
scope when sending the authentication request to Google Cloud. This means that it's not possible to access Google Drive backed tables in BigQuery.Adding the scope does indeed provide the required access. I'm only interested in the Python SDK, but this change should be easy to apply on all other SDKs as well if there's interest in that.
I can work on this change and submit a pull request.
Issue Priority
Priority: 2
Issue Component
Component: sdk-py-core
The text was updated successfully, but these errors were encountered: