feat(ibis): Athena default credential chain authentication support#1362
feat(ibis): Athena default credential chain authentication support#1362goldmedal merged 5 commits intoCanner:mainfrom
Conversation
WalkthroughAthenaConnectionInfo makes AWS credential and role/web‑identity fields optional and adds new fields. get_athena_connection builds credentials dynamically (web‑identity STS, static keys, or default chain) and calls ibis. Tests add OIDC/default‑credential fixtures, a parameterized Athena auth‑modes test, and update decimal/timestamp expectations. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Test / API Client
participant Builder as get_athena_connection
participant STS as AWS STS
participant Ibis as ibis.athena.connect
Client->>Builder: provide AthenaConnectionInfo
alt web_identity_token & role_arn present
Builder->>STS: AssumeRoleWithWebIdentity(role_arn, web_identity_token, role_session_name?)
STS-->>Builder: temp creds (access_key, secret, session_token)
Note right of Builder: include temp creds in kwargs
else aws_access_key_id & aws_secret_access_key present
Builder->>Builder: include static access_key, secret, aws_session_token?
else
Builder->>Builder: omit credentials (use default AWS chain)
end
Builder->>Builder: include s3_staging_dir, schema_name, region_name?
Builder->>Ibis: connect(**kwargs)
Ibis-->>Builder: connection
Builder-->>Client: return connection
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
ibis-server/app/model/data_source.py (1)
269-273: Consider validating that role_arn is required when web_identity_token is provided.While PyAthena may handle this validation internally, it would be better to fail fast with a clear error message if
web_identity_tokenis provided withoutrole_arn.Add validation before the connection attempt:
# ── 1️⃣ Web Identity Token flow (Google OIDC → AWS STS) ─── if info.web_identity_token and info.role_arn: kwargs["web_identity_token"] = info.web_identity_token.get_secret_value() kwargs["role_arn"] = info.role_arn.get_secret_value() if info.role_session_name: kwargs["role_session_name"] = info.role_session_name.get_secret_value() +elif info.web_identity_token or info.role_arn: + raise WrenError( + ErrorCode.INVALID_CONNECTION_INFO, + "Both web_identity_token and role_arn must be provided together for OIDC authentication" + )ibis-server/app/model/__init__.py (1)
119-153: Consider adding validation for mutually exclusive authentication methods.While the current implementation allows any combination of authentication fields, it might be clearer to validate that only one authentication method is used at a time. However, this could also be deferred to runtime in the connection logic, which may provide better error messages with context.
If you want to enforce validation at the model level, you could add a Pydantic validator:
from pydantic import model_validator class AthenaConnectionInfo(BaseConnectionInfo): # ... existing fields ... @model_validator(mode='after') def validate_auth_method(self): """Ensure only one authentication method is provided.""" has_static = bool(self.aws_access_key_id and self.aws_secret_access_key) has_oidc = bool(self.web_identity_token and self.role_arn) if has_static and has_oidc: raise ValueError( "Cannot provide both static credentials and web identity token authentication" ) if self.web_identity_token and not self.role_arn: raise ValueError( "role_arn is required when web_identity_token is provided" ) if self.role_arn and not self.web_identity_token: raise ValueError( "web_identity_token is required when role_arn is provided" ) return selfThis is optional since the connection logic already handles these cases gracefully.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
ibis-server/app/model/__init__.py(1 hunks)ibis-server/app/model/data_source.py(1 hunks)ibis-server/tests/routers/v3/connector/athena/conftest.py(1 hunks)ibis-server/tests/routers/v3/connector/athena/test_query.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py (2)
ibis-server/tests/conftest.py (1)
client(18-23)ibis-server/tests/routers/v3/connector/athena/test_functions.py (1)
manifest_str(31-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (2)
ibis-server/app/model/data_source.py (1)
250-291: Well-structured authentication flow implementation.The refactored connection logic correctly implements three authentication methods with proper fallback behavior. The conditional structure ensures mutually exclusive credential flows, and all optional parameters are handled appropriately.
ibis-server/app/model/__init__.py (1)
112-166: Well-documented model with comprehensive authentication support.The refactored
AthenaConnectionInfomodel provides clear support for all three authentication methods with excellent documentation. The optional fields and defaults are appropriate for the flexible authentication flows.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
ibis-server/app/model/__init__.py (1)
136-153: Consider validating web identity credential requirements.The web identity fields are well-documented. However,
web_identity_tokenrequiresrole_arnto function correctly for AssumeRoleWithWebIdentity authentication. Consider adding validation to ensure they're provided together.This can be added to the same
@model_validatorsuggested for static credentials:@model_validator(mode='after') def validate_credential_pairs(self): # Validate web identity credentials if self.web_identity_token is not None and self.role_arn is None: raise ValueError( "role_arn must be provided when using web_identity_token" ) return selfibis-server/tests/routers/v3/connector/athena/test_query.py (1)
216-233: Fixture exists; remove verification concern but retain assertion improvement suggestion.The
connection_info_default_credential_chainfixture is properly defined atibis-server/tests/routers/v3/connector/athena/conftest.py:35. The original concern about a missing fixture is resolved.The test parametrization and fixture retrieval are correct. However, the assertion quality suggestion remains valid—the test only verifies response structure without validating actual query results or data correctness.
assert response.status_code == 200 result = response.json() assert "columns" in result assert "data" in result + assert len(result["columns"]) > 0 + assert len(result["data"]) > 0
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
ibis-server/app/model/__init__.py(1 hunks)ibis-server/app/model/data_source.py(1 hunks)ibis-server/tests/routers/v2/connector/test_athena.py(3 hunks)ibis-server/tests/routers/v3/connector/athena/conftest.py(1 hunks)ibis-server/tests/routers/v3/connector/athena/test_query.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- ibis-server/app/model/data_source.py
- ibis-server/tests/routers/v3/connector/athena/conftest.py
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py (2)
ibis-server/tests/conftest.py (1)
client(18-23)ibis-server/tests/routers/v3/connector/athena/conftest.py (1)
connection_info(24-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (5)
ibis-server/tests/routers/v2/connector/test_athena.py (2)
62-62: LGTM: Test expectation updated to match new timestamp expression format.The change from a bare timestamp literal to a CAST expression aligns with how Athena now represents timestamp values in the manifest.
116-116: The decimal precision change is intentional and backed by application logic, but needs manual verification against live Athena.The
totalpricedtype change fromdecimal128(15, 2)todecimal128(38, 9)is not a test-only modification. It's driven by the_round_decimal_columnsmethod inibis-server/app/model/connector.py(lines 210-217), which hardcodes precision to 38 (PyArrow's maximum) and defaults scale to 9 for all Decimal types.This means every decimal column returned by Athena queries will be cast to
decimal128(38, 9). The data values remain correct. However, verify that this aligns with the actual behavior of the Canner Ibis fork (canner/10.8.1) and current Athena type inference, as this precision appears to differ from what was previously expected.ibis-server/tests/routers/v3/connector/athena/test_query.py (1)
93-93: LGTM: Decimal precision update consistent with v2 tests.The
totalpricedtype update todecimal128(38, 9)aligns with the same change in v2 tests, reflecting Athena's default decimal precision.ibis-server/app/model/__init__.py (2)
161-166: Verify the suggested default value approach—Pydantic v2 strict typing may reject plain strings.The current implementation
default=SecretStr("default")works but is not idiomatic for Pydantic v2. However, the review's suggestion to usedefault="default"may fail type validation since the field is typed asSecretStr | Noneand Pydantic v2 has strict type checking.The recommended approach for Pydantic v2 is to use a default factory instead:
schema_name: SecretStr | None = Field( alias="schema_name", description="The database name in Athena. Defaults to 'default'.", examples=["default"], - default=SecretStr("default"), + default_factory=lambda: SecretStr("default"), )This is idiomatic for Pydantic v2 and avoids potential type coercion issues. However, verify that:
- The plain string default suggested in the review actually works with your Pydantic v2 configuration
- All code paths that rely on
schema_namebeing aSecretStr(not a plain string) function correctly
119-133: No changes needed. The code already validates credential pairing correctly.The review comment assumes no credential pairing validation exists, but the codebase already implements this on line 268 of
ibis-server/app/model/data_source.py: theelifcondition usesand, requiring bothaws_access_key_idandaws_secret_access_keyto be truthy. If only one is provided, the condition fails and the code gracefully falls back to the default AWS credential chain (environment variables,~/.aws/credentials, or IAM role), which is the intended behavior per the code comments (lines 277-280).This design is correct: it prevents incomplete static credentials from being used, yet remains flexible for cases where credentials are provided through multiple sources (model fields, environment variables, files).
Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py (1)
223-229: Preferbase_urlover a hard-coded path.We already import
base_url, so duplicating/v3/connector/athenahere will drift if the route changes. Please reuse the shared constant for consistency.- response = await client.post( - url="/v3/connector/athena/query", + response = await client.post( + url=f"{base_url}/query",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py (2)
ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)
manifest_str(137-138)ibis-server/tests/routers/v3/connector/athena/conftest.py (1)
connection_info(24-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci
🔇 Additional comments (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py (1)
89-99: Updated decimal precision matches backend change.The new expectation for
decimal128(38, 9)keeps the test aligned with the Athena type mapping and looks good.
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
ibis-server/app/model/data_source.py(2 hunks)ibis-server/tests/routers/v3/connector/athena/conftest.py(1 hunks)ibis-server/tests/routers/v3/connector/athena/test_query.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py (3)
ibis-server/tests/conftest.py (1)
client(18-23)ibis-server/tests/routers/v3/connector/athena/test_functions.py (1)
manifest_str(31-32)ibis-server/tests/routers/v3/connector/athena/conftest.py (1)
connection_info(24-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: ci
- GitHub Check: ci
🔇 Additional comments (6)
ibis-server/tests/routers/v3/connector/athena/test_query.py (2)
93-93: LGTM: Test expectation updated for higher precision decimal type.The decimal precision update from
decimal128(15, 2)todecimal128(38, 9)aligns with the changes in Athena connection handling and is consistently applied across test expectations.
216-262: LGTM: Parameterized test correctly exercises multiple authentication modes.The test properly validates Athena connectivity across three authentication strategies (static credentials, default credential chain, and OIDC). All referenced fixtures are defined in
conftest.py, and the test logic correctly mirrors the existingtest_queryfunction.ibis-server/tests/routers/v3/connector/athena/conftest.py (2)
34-47: LGTM: Default credential chain fixture correctly validates environment setup.The fixture appropriately checks for AWS credentials in the environment (to ensure the test can succeed) while intentionally omitting them from the connection info dict, allowing PyAthena to use the default credential chain resolution.
50-64: LGTM: OIDC fixture correctly configures web identity authentication.The fixture properly reads OIDC-specific environment variables and provides all required fields (web_identity_token, role_arn) for the AssumeRoleWithWebIdentity flow. Using separate environment variable names for OIDC-specific configuration is a good practice.
ibis-server/app/model/data_source.py (2)
252-259: LGTM: Base connection parameters correctly initialized.The kwargs dictionary properly initializes required fields (s3_staging_dir, schema_name) and conditionally adds the optional region_name, correctly using
.get_secret_value()for SecretStr fields.
284-299: LGTM: Static credentials and default chain fallback correctly implemented.The function properly handles static AWS credentials (with optional session token) and falls back to the default credential chain when neither OIDC nor static credentials are provided. The use of
**kwargsforibis.athena.connectis correct.
ef97a49 to
44655e3
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
ibis-server/app/model/__init__.py (1)
116-157: AthenaConnectionInfo additions align with credential modes; fix region_name typingThe new optional credential fields (static keys, session token, web identity, role ARN/session name) and the updated
schema_namedefault are well-structured and match the intended auth flows (static, default chain, OIDC).One remaining nit:
- Line 160–164:
region_nameis annotated asSecretStrbut hasdefault=Noneand the description says it’s optional. For consistency with its actual usage and with the other optional fields, this should beSecretStr | None.This was already flagged in a previous review; updating the annotation would resolve the inconsistency between the type hint, default, and description.
Also applies to: 160-170
🧹 Nitpick comments (2)
ibis-server/tests/routers/v3/connector/athena/test_query.py (1)
216-262: Parametrized Athena auth‑modes test looks good; reuse base_url for consistencyThe new
test_query_athena_modescorrectly exercises the three connection info variants and verifies both data and dtypes, giving good coverage across static, default‑chain, and OIDC flows.Minor suggestion: you could use
f"{base_url}/query"instead of the hardcoded"/v3/connector/athena/query"to keep the endpoint path centralized with the rest of the file.ibis-server/tests/routers/v3/connector/athena/conftest.py (1)
34-64: New credential‑mode fixtures are consistent with AthenaConnectionInfo
connection_info_default_credential_chaincorrectly omits explicit AWS keys so the connector can rely on the default credential chain, while still skipping cleanly whenAWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEYaren’t present.connection_info_oidcmatches the new web‑identity/role fields (web_identity_token,role_arn) and uses separate OIDC‑specific env vars, which keeps the modes clearly separated.If you ever want to test non‑env default‑chain sources (e.g., instance profiles), you might relax the env‑var check and instead rely on the connector to fail with a clear error, but the current skip behavior is fine for CI.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
ibis-server/app/model/__init__.py(1 hunks)ibis-server/app/model/data_source.py(2 hunks)ibis-server/tests/routers/v2/connector/test_athena.py(3 hunks)ibis-server/tests/routers/v3/connector/athena/conftest.py(1 hunks)ibis-server/tests/routers/v3/connector/athena/test_query.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- ibis-server/app/model/data_source.py
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py (2)
ibis-server/tests/routers/v3/connector/athena/test_functions.py (1)
manifest_str(31-32)ibis-server/tests/routers/v3/connector/athena/conftest.py (1)
connection_info(24-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: ci
- GitHub Check: ci
🔇 Additional comments (2)
ibis-server/tests/routers/v2/connector/test_athena.py (1)
61-64: timestamptz expression and widened decimal dtype look consistent
- The explicit
CAST(TIMESTAMP '2024-01-01 23:59:59 UTC' AS timestamp)keeps the same literal value while making the resulting type explicit, which should make type inference more predictable.- Updating
totalpricetodecimal128(38, 9)in both expectations aligns with the higher‑precision backend type and keeps v2 tests consistent with the v3 tests.No functional issues spotted here.
Also applies to: 112-123, 152-163
ibis-server/tests/routers/v3/connector/athena/test_query.py (1)
89-99: Dtype update for totalprice matches backend precisionThe widened
totalpriceexpectation todecimal128(38, 9)is consistent with the other Athena tests and the higher‑precision backend type. No issues here.
Previously, users were required to explicitly provide
aws_access_key_idandaws_secret_access_keyin the connection info.Now, the connector can authenticate using any of the following methods supported by PyAthena / boto3:
Default AWS credential chain (e.g., EC2/ECS IAM roles, ~/.aws/credentials, environment variables)
Static credentials (aws_access_key_id, aws_secret_access_key)
Web Identity / OIDC federation via AssumeRoleWithWebIdentity
OIDC connection info example
Generate a Web Identity Token from gcloud cli
PyAthena credential doc
Summary by CodeRabbit
New Features
Tests