feat(oauth2): add support for trino#30081
Conversation
e08135d to
a79dbe0
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #30081 +/- ##
===========================================
+ Coverage 60.48% 83.91% +23.42%
===========================================
Files 1931 534 -1397
Lines 76236 38788 -37448
Branches 8568 0 -8568
===========================================
- Hits 46114 32549 -13565
+ Misses 28017 6239 -21778
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Looks like @betodealmeida was working on #30126 at the same time which might be related -- Beto, are you able to review this one? |
|
tagging @betodealmeida as he has most context to review this |
superset/utils/oauth2.py
Outdated
There was a problem hiding this comment.
I might be mistaken, but I believe this validation is only applied to the client_info contained within encrypted_extra when provided by a user. Is the intent that this is where the request_content_type is going to be provided, or is it going to be set as a default for the Trino engine spec?
There was a problem hiding this comment.
Right, I think the idea is that DB engine specs can have a default content type (defined in the oauth2_token_request_type class attribute), but it can be overridden on a per-database basis by setting it in the encrypted_extra.
betodealmeida
left a comment
There was a problem hiding this comment.
Awesome! I left a few comments but nothing blocking.
superset/db_engine_specs/base.py
Outdated
There was a problem hiding this comment.
Nice!
I think that the most common (standard?) workflow is to send form encoded data, and not JSON. It's just that the first OAuth2 implementation done for Superset targeted GSheets, and Google uses JSON instead of form encoded (it's the same for BigQuery, for example). But now changing the default would break existing databases, so we need to leave JSON as the default.
There was a problem hiding this comment.
I agree that data is likely the more common implementation. As we're still in early days, I feel it's well motivated to introduce a breaking change, as long as we have an UPDATING.md explaining the required steps to get it to work again. This in the interest of having a clean/idiomatic API in the long term..
superset/utils/oauth2.py
Outdated
There was a problem hiding this comment.
Right, I think the idea is that DB engine specs can have a default content type (defined in the oauth2_token_request_type class attribute), but it can be overridden on a per-database basis by setting it in the encrypted_extra.
superset/db_engine_specs/trino.py
Outdated
There was a problem hiding this comment.
Nice, I like this approach!
You can simplify this to:
return isinstance(instance, HttpError) and "error 401: b'Invalid credentials'" in str(instance)There was a problem hiding this comment.
| Config for GSheets OAuth2. | |
| Config for Trino OAuth2. |
|
tried my luck fixing the merge conflict in the GitHub web editor, didn't pull the branch and run CI, so no pre-commit checks.... let's see if I get lucky, if not feel free to disregard my commit |
villebro
left a comment
There was a problem hiding this comment.
A few comments, other than that LGTM
superset/db_engine_specs/base.py
Outdated
There was a problem hiding this comment.
I agree that data is likely the more common implementation. As we're still in early days, I feel it's well motivated to introduce a breaking change, as long as we have an UPDATING.md explaining the required steps to get it to work again. This in the interest of having a clean/idiomatic API in the long term..
superset/db_engine_specs/base.py
Outdated
There was a problem hiding this comment.
nit: If we're defaulting to "", when might these still be None?
|
@villebro @betodealmeida looks like we have a few nits and unresolved questions here, but the thread has gone quiet. If @joaoferrao doesn't respond, would anyone want to block this, or should someone smash that merge button? |
|
(note that code suggestions can be committed here by the person offering them, btw!) |
|
Sorry, I will take a look at the comments over the coming days. I’ve been rather busy so I haven’t had time to refocus attention again.
|
…d query execution
1bb9303 to
4011e69
Compare
|
@betodealmeida @villebro I have addressed the main comments, I believe. One question I still made when opening my PR is still open:
I don't know enough about superset, but should this be addressed in a separate PR? Basically, I had to do a lot of hacking to be able to add the connection without the |
|
I’ve been working on setting up OAuth2.0 integration for Trino with Okta in Superset, following the guidance from this PR (#30081). However, despite configuring authorization_request_uri, token_request_uri, and handling impersonation, I’m still encountering issues with 401 Unauthorized errors when trying to connect to Trino. trino_oauthDATABASE_OAUTH2_REDIRECT_URI: "http://localhost:8088/api/v1/database/oauth2/" {"connect_args":{"http_scheme":"https"}} Impersonate: true is in place as well and UI shows the correct configs. I've reviewed the setup and ensured that the OAuth tokens are obtained correctly from Okta, and all necessary permissions seem in place. Are there any additional configurations or specifics for Trino that this PR setup might require? Any guidance or insights on further troubleshooting would be greatly appreciated! Thank you! |
A bit more detail would help fine tune the issue. Can I ask if you, even if redundantly:
If you are sure you are getting the access token back from the IdP (after being requested to authenticate), then this is what is sent to trino: if backend_name == "trino" and username is not None:
connect_args["user"] = username
if access_token is not None:
http_session = requests.Session()
http_session.headers.update({"Authorization": f"Bearer {access_token}"})
connect_args["http_session"] = http_session |
|
@joaoferrao the first part works. I verified that. I am unclear about the username part. Can you shed some more light on that? |
Yeah, this was a chicken-and-egg problem. The token is stored associated with the database ID, so you can't do OAuth2 before the database is created. But for the database to be created, we require a successful connection, which in this case is impossible. I added some logic in #30071 for Snowflake so that if the test connection fails because OAuth2 is needed then the test connection command succeeds: Wasn't this logic enough for the same flow with Trino? |
|
@betodealmeida that is indeed very similar to what I did for me to test the feature, just that I didn't publish it, because I didn't delve enough to the Thanks for the feedback. I see all checks are passed and my PR is still approved. Do I assume correctly I should merge it? |
Take what I'm saying with a grain of salt, but the point of impersonation is for your superset user to be impersonated upon the request to trino. Whatever the username is for superset, is going to be used in the |
|
@joaoferrao I did verify the command trino --server='' --external-authentication=true with my okta username and that works on the browser but throws cannot impersonate user error in the end on the cluster. Tried with making that setting false still no luck. |
Does it or doesn't in work when you remove superset from the equation? The previous comment left me a bit confused about the scenario you described.
|
|
@joaoferrao thanks for all the pointers. Noticed that trino is using the email and superset uses okta token as the username hence the mismatch and 401. I will need to change the setup on either trino or superset side to get the authentication correct. |
|
Thanks for working on this, @joaoferrao! |
Hello @joaoferrao, do you have a solution to create the connection to Trino with oauth 2 enabled in a production environment ? |
|
@joaoferrao @betodealmeida greetings! I asked sometime ago some questions about setting up OAuth2.0 integration for Trino with Okta in Superset, following the guidance from this PR (#30081). We have got past the username part and now superset recognizes email as username. I am having trouble now setting up the database connection. It keeps throwing unauthorized error. trino_oauth {"connect_args":{"http_scheme":"https"}} Impersonate: true is in place as well and UI shows the correct configs. Any guidance or insights on further troubleshooting would be greatly appreciated! Thank you! |
SUMMARY
Under #27631 under #20300
It also fixes an issue not totally resolved here #29981, which is required for OAuth2 to work for trino.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Screen.Recording.2024-08-31.at.16.57.22.mov
TESTING INSTRUCTIONS
superset_docker_config.py:ADDITIONAL INFORMATION
Need feedback with:
We still need to trigger this OAuth2 dance in, at least, 2 contexts (I don't know much about superset, possibility there are more):