feat(oauth2): add PKCE support for database OAuth2 authentication#37067
Conversation
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Vitor-Avila
left a comment
There was a problem hiding this comment.
LGTM! Also manually tested the flow using an OAuth2 connection with PKCE enforced and it worked. 🙌
| # PKCE code verifier stored in state to be retrieved during token exchange | ||
| "code_verifier": code_verifier, |
There was a problem hiding this comment.
Suggestion: The PKCE code verifier is being embedded into the OAuth2 state payload, which is a signed but unencrypted JWT sent as a query parameter; this allows any interceptor who can see the authorization code to also decode the state and obtain the code_verifier, defeating the security guarantees of PKCE and re‑enabling authorization code interception attacks. [security]
Severity Level: Critical 🚨
- ❌ Database OAuth2 authentication compromise risk
- ❌ OAuth2 redirect flow (DatabaseRestApi.oauth2) security degraded
- ⚠️ Token exchange endpoint can be abused if intercepted
- ⚠️ Unit tests currently assume state contains code_verifier| # PKCE code verifier stored in state to be retrieved during token exchange | |
| "code_verifier": code_verifier, | |
| # PKCE code verifier is managed server-side and not stored in OAuth2 state |
Steps of Reproduction ✅
1. Trigger a DB OAuth2 authorization flow by calling BaseEngineSpec.start_oauth2_dance
(superset/db_engine_specs/base.py:472-516). This method generates a PKCE code_verifier
(base.py:490) and inserts it into the OAuth2 state dict under the "code_verifier" key
(base.py:513-514).
2. The state dict is encoded and sent to the provider via
BaseEngineSpec.get_oauth2_authorization_uri, which calls encode_oauth2_state(state)
(base.py:572-576 -> encode_oauth2_state defined in superset/utils/oauth2.py:152). The
code_verifier is therefore embedded into the encoded state that becomes a query parameter
on the authorization request URL.
3. Inspect superset/utils/oauth2.py: around lines 152-165 — encode_oauth2_state includes
state keys in the JWT payload and (per grep results) will add "code_verifier" to the
encoded payload when present (utils/oauth2.py:163 shows payload["code_verifier"] =
state["code_verifier"]). Because the state is a signed JWT but not encrypted, any party
that can observe the authorization request/redirect (e.g., network traffic, logs, browser
history, referrers) and can decode the JWT (signed but not encrypted) can obtain the
code_verifier.
4. An attacker who intercepts the authorization code and the state (both are returned to
the redirect URI) can extract the code_verifier from the state and use it to redeem the
intercepted authorization code at the token endpoint. The token exchange path that accepts
a code_verifier exists (BaseEngineSpec.get_oauth2_token, base.py:589-614) and is called by
the server-side command that completes the flow
(superset/commands/database/oauth2.py:50-58 passes code_verifier from state into token
request). This demonstrates the concrete end-to-end leak: start_oauth2_dance stores
code_verifier in state -> encode_oauth2_state places it in JWT payload -> token exchange
consumes it from the received state.
5. Conclusion / why this is a problem: Embedding the PKCE secret (code_verifier) into the
client-visible state defeats PKCE's protection model; PKCE requires the verifier to remain
confidential between the client and the authorization server exchange step. The current
implementation places that secret into a signed-but-unencrypted state JWT that accompanies
the authorization code, enabling an interceptor to both obtain the authorization code and
the secret needed to redeem it.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/db_engine_specs/base.py
**Line:** 513:514
**Comment:**
*Security: The PKCE code verifier is being embedded into the OAuth2 `state` payload, which is a signed but unencrypted JWT sent as a query parameter; this allows any interceptor who can see the authorization code to also decode the state and obtain the `code_verifier`, defeating the security guarantees of PKCE and re‑enabling authorization code interception attacks.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.There was a problem hiding this comment.
Great catch! Let me fix this.
Code Review Agent Run #58e716Actionable Suggestions - 0Additional Suggestions - 1
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
| key=UUID(tab_id), | ||
| expires_on=datetime.now() + timedelta(minutes=5), | ||
| ) | ||
| db.session.flush() |
There was a problem hiding this comment.
Suggestion: The PKCE code_verifier is written to the key-value store and then OAuth2RedirectError is raised without committing the transaction, so in configurations where sessions are rolled back on exceptions the code_verifier row will never actually be persisted and the subsequent token exchange will not find it, causing the PKCE flow to fail; replacing the flush with a commit (or otherwise ensuring a commit here) guarantees the verifier is durable before redirecting the user. [logic error]
Severity Level: Critical 🚨
- ❌ OAuth2 PKCE token exchange fails for DB OAuth2 flows
- ⚠️ OAuth2 database authentication unreliable in some deployments
- ⚠️ Affects any engine using DATABASE_OAUTH2_CLIENTS config| db.session.flush() | |
| db.session.commit() |
Steps of Reproduction ✅
1. Configure a Database with OAuth2 in app config and mark engine as OAuth2-enabled. (See
superset/db_engine_specs/base.py start_oauth2_dance where PKCE is generated and stored;
PKCE storage occurs at the snippet starting at superset/db_engine_specs/base.py:501.)
2. Trigger an OAuth2 authorization flow from a frontend operation that hits the code path
which raises OAuth2RedirectError. In practice this happens when a DB call raises an
OAuth2RedirectError and execute() calls start_oauth2_dance (execute handles exceptions and
calls cls.start_oauth2_dance(database) — see superset/db_engine_specs/base.py execute
method which flows into start_oauth2_dance).
3. start_oauth2_dance generates a code_verifier and calls KeyValueDAO.create_entry, then
calls db.session.flush() (lines shown in the existing_code block at
superset/db_engine_specs/base.py:501-509).
4. Because an OAuth2RedirectError is raised immediately after (the flow raises
OAuth2RedirectError to redirect the browser — see the raise at
superset/db_engine_specs/base.py around the OAuth redirect), some configurations roll back
the current DB session on exceptions. In those setups the earlier db.session.flush() does
not persist the KeyValueDAO entry (it can be rolled back), so the later token exchange
that looks up the verifier in the KV store fails to find it and PKCE token exchange is
rejected.
5. Reproduce locally by: (a) enabling DATABASE_OAUTH2_CLIENTS for an engine, (b) causing a
DB operation that triggers OAuth2 redirect (the path through execute() -> needs_oauth2()
-> start_oauth2_dance()), (c) inspect the KV store for the tab_id key — the entry will be
missing if the session was rolled back because only flush() was called. Observed failure:
the token exchange step cannot validate PKCE because stored verifier does not exist.
6. Explanation: a flush() only sends SQL to the DB within the transaction; without commit
the row can be rolled back by exception handlers. Replacing flush() with commit() (or
ensuring the entry is persisted outside the request transaction) guarantees durability
across the redirect and subsequent token exchange.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/db_engine_specs/base.py
**Line:** 509:509
**Comment:**
*Logic Error: The PKCE `code_verifier` is written to the key-value store and then `OAuth2RedirectError` is raised without committing the transaction, so in configurations where sessions are rolled back on exceptions the `code_verifier` row will never actually be persisted and the subsequent token exchange will not find it, causing the PKCE flow to fail; replacing the flush with a commit (or otherwise ensuring a commit here) guarantees the verifier is durable before redirecting the user.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.1b1bb73 to
62911a3
Compare
There was a problem hiding this comment.
Code Review Agent Run #c1544d
Actionable Suggestions - 1
-
superset/db_engine_specs/base.py - 1
- Type mismatch in class attribute · Line 533-535
Additional Suggestions - 3
-
superset/commands/database/oauth2.py - 1
-
Reliability: Handle KV store failures for PKCE fallback · Line 64-72The added PKCE code_verifier retrieval calls KeyValueDAO methods that can raise exceptions (e.g., DB connection issues), which would cause the entire OAuth2 token exchange to fail. Since PKCE is optional, wrap these operations in try-except to fall back to non-PKCE flow on failure, ensuring reliability.
Code suggestion
@@ -64,9 +64,11 @@ - if tab_uuid: - kv_value = KeyValueDAO.get_value( - resource=KeyValueResource.PKCE_CODE_VERIFIER, - key=tab_uuid, - codec=JsonKeyValueCodec(), - ) - if kv_value: - code_verifier = kv_value.get("code_verifier") - KeyValueDAO.delete_entry(KeyValueResource.PKCE_CODE_VERIFIER, tab_uuid) + if tab_uuid: + try: + kv_value = KeyValueDAO.get_value( + resource=KeyValueResource.PKCE_CODE_VERIFIER, + key=tab_uuid, + codec=JsonKeyValueCodec(), + ) + if kv_value: + code_verifier = kv_value.get("code_verifier") + KeyValueDAO.delete_entry(KeyValueResource.PKCE_CODE_VERIFIER, tab_uuid) + except Exception: + pass
-
-
superset/superset_typing.py - 1
-
Type Safety Regression · Line 359-359Adding `total=False` makes all OAuth2State fields optional, but the code accesses them directly without checks (e.g., `self._state["tab_id"]` in commands/database/oauth2.py), and the OAuth2StateSchema requires all fields. This creates a type safety mismatch that could allow partial states to pass type checks but fail at runtime validation.
Code suggestion
@@ -359,1 +359,1 @@ - class OAuth2State(TypedDict, total=False): + class OAuth2State(TypedDict):
-
-
superset/utils/oauth2.py - 1
-
Misleading constant naming · Line 43-44The constant name PKCE_CODE_VERIFIER_LENGTH suggests it defines the length of the code verifier string, but it actually specifies the number of random bytes used for generation. This could mislead developers. Consider renaming to CODE_VERIFIER_RANDOM_BYTES for clarity, as the resulting string length (~86 characters) already meets RFC 7636 requirements (43-128 characters).
Code suggestion
@@ -43,2 +43,2 @@ - # PKCE code verifier length (RFC 7636 recommends 43-128 characters) - PKCE_CODE_VERIFIER_LENGTH = 64 + # Number of random bytes for PKCE code verifier (RFC 7636 recommends 43-128 characters for the verifier) + CODE_VERIFIER_RANDOM_BYTES = 64 @@ -56,1 +56,1 @@ - random_bytes = secrets.token_bytes(PKCE_CODE_VERIFIER_LENGTH) + random_bytes = secrets.token_bytes(CODE_VERIFIER_RANDOM_BYTES)
-
Review Details
-
Files reviewed - 10 · Commit Range:
c13ef80..1b1bb73- superset/commands/database/oauth2.py
- superset/db_engine_specs/base.py
- superset/db_engine_specs/gsheets.py
- superset/key_value/types.py
- superset/superset_typing.py
- superset/utils/oauth2.py
- tests/unit_tests/databases/api_test.py
- tests/unit_tests/db_engine_specs/test_base.py
- tests/unit_tests/sql_lab_test.py
- tests/unit_tests/utils/oauth2_tests.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| custom_errors: dict[Pattern[str], tuple[str, SupersetErrorType, dict[str, Any]]] = ( | ||
| {} | ||
| ) |
There was a problem hiding this comment.
The custom_errors class attribute is typed as dict but assigned a tuple, which will cause an AttributeError at runtime when .items() is called on it in error handling code.
Code suggestion
Check the AI-generated fix before applying
| custom_errors: dict[Pattern[str], tuple[str, SupersetErrorType, dict[str, Any]]] = ( | |
| {} | |
| ) | |
| custom_errors: dict[Pattern[str], tuple[str, SupersetErrorType, dict[str, Any]]] = {} |
Code Review Run #c1544d
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
Code Review Agent Run #adb2e6Actionable Suggestions - 0Additional Suggestions - 1
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
Code Review Agent Run #d55028Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
fe242e0 to
9404d58
Compare
There was a problem hiding this comment.
Code Review Agent Run #00c8f3
Actionable Suggestions - 2
-
superset/db_engine_specs/base.py - 1
- Datetime without timezone information · Line 639-639
-
superset/superset_typing.py - 1
- Incorrect Optional Fields in TypedDict · Line 359-359
Additional Suggestions - 1
-
superset/utils/oauth2.py - 1
-
PKCE verifier length mismatch · Line 44-44The constant PKCE_CODE_VERIFIER_LENGTH is set to 64 and used as bytes in secrets.token_bytes(), producing a verifier of ~85 characters. Given the name and RFC comment suggest 64 characters, update to 48 bytes for exact 64-character output.
Code suggestion
@@ -44,1 +44,1 @@ - PKCE_CODE_VERIFIER_LENGTH = 64 + PKCE_CODE_VERIFIER_LENGTH = 48
-
Review Details
-
Files reviewed - 10 · Commit Range:
c13ef80..9404d58- superset/commands/database/oauth2.py
- superset/db_engine_specs/base.py
- superset/db_engine_specs/gsheets.py
- superset/key_value/types.py
- superset/superset_typing.py
- superset/utils/oauth2.py
- tests/unit_tests/databases/api_test.py
- tests/unit_tests/db_engine_specs/test_base.py
- tests/unit_tests/sql_lab_test.py
- tests/unit_tests/utils/oauth2_tests.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| value={"code_verifier": code_verifier}, | ||
| codec=JsonKeyValueCodec(), | ||
| key=UUID(tab_id), | ||
| expires_on=datetime.now() + timedelta(minutes=5), |
There was a problem hiding this comment.
Replace datetime.now() with datetime.now(timezone.utc) or use datetime.utcnow() to ensure timezone-aware datetime objects.
Code suggestion
Check the AI-generated fix before applying
-from datetime import datetime, timedelta
+from datetime import datetime, timedelta, timezone
@@ -639,1 +639,1 @@
- expires_on=datetime.now() + timedelta(minutes=5),
+ expires_on=datetime.now(timezone.utc) + timedelta(minutes=5),
Code Review Run #00c8f3
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
|
|
||
|
|
||
| class OAuth2State(TypedDict): | ||
| class OAuth2State(TypedDict, total=False): |
There was a problem hiding this comment.
Adding total=False to OAuth2State makes all fields optional, but the code in encode_oauth2_state, decode_oauth2_state, and other usages assumes they are always present (e.g., direct key access like state["database_id"]). This would cause MyPy type errors and potential runtime KeyErrors if fields are ever missing, though currently all usages provide them.
Code suggestion
Check the AI-generated fix before applying
| class OAuth2State(TypedDict, total=False): | |
| class OAuth2State(TypedDict): |
Code Review Run #00c8f3
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
9404d58 to
b70d7c9
Compare
Vitor-Avila
left a comment
There was a problem hiding this comment.
lgtm! tested again after changes
345e815 to
426412e
Compare
Code Review Agent Run #69678cActionable Suggestions - 0Additional Suggestions - 1
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
40662de to
2aaf769
Compare
Code Review Agent Run #918cc3Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
2aaf769 to
8b7e66d
Compare
Code Review Agent Run #c4f172Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
8b7e66d to
aa059d7
Compare
aa059d7 to
011a0d7
Compare
Code Review Agent Run #455907Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
- theming.mdx: document ECharts array property overrides (PR #37965) — array values like color palettes are fully supported and replaced entirely (not merged); adds Array Property Overrides section with color palette example - configuring-superset.mdx: document PKCE support for database OAuth2 (PR #37067) — add PKCE section under Custom OAuth2 with code_challenge_method config and when to use it - cache.mdx: document ETag support for thumbnail/screenshot endpoints (PR #37663) — conditional GET with If-None-Match to avoid downloading unchanged images - exploring-data.mdx: document SQL Lab UX improvements (PRs #37298, #37694, #37756) — treeview table browser, Ctrl+F find widget, resizable panels; also adds time range natural language expressions section (PR #37098) - creating-your-first-dashboard.mdx: document Table chart features — boolean and categorical conditional formatting (PRs #36338, #37756), gradient toggle (PR #36280), HTML cell rendering with security note (PR #37685), column header tooltips from dataset descriptions (PR #37179), Display Controls modal in dashboard view (PR #36062) - databases.json: update StarRocks supports_catalog and supports_dynamic_catalog to true — the engine spec (PR #37026) already implemented full catalog support with get_catalog_names(), get_default_catalog(), and SHOW CATALOGS; the committed JSON was stale and did not reflect this Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SUMMARY
This PR adds PKCE (Proof Key for Code Exchange - RFC 7636) support to the OAuth2 flow for database connections. PKCE is a security extension to OAuth2 that protects against authorization code interception attacks, which is especially important for public clients or environments where the authorization code could be intercepted.
How it works:
code_verifieris generated and stored in the JWT-encoded statecode_challenge(SHA-256 hash of the verifier, base64url-encoded) is sent to the authorization servercode_verifieris includedChanges:
generate_code_verifier()andgenerate_code_challenge()helper functions insuperset/utils/oauth2.pyOAuth2StateTypedDictto include optionalcode_verifierfieldstart_oauth2_dance()to generate PKCEcode_verifierget_oauth2_authorization_uri()to includecode_challengeandcode_challenge_method=S256parametersget_oauth2_token() to accept and forward thecode_verifier` parameterOAuth2StoreTokenCommand to passcode_verifier` from state to token exchangeAll changes are backward compatible - databases that don't support PKCE will simply ignore the additional parameters, and existing OAuth2 flows without PKCE will continue to work.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A - No UI changes. The PKCE parameters are added transparently to the OAuth2 flow.
TESTING INSTRUCTIONS
Added unit tests.
ADDITIONAL INFORMATION