Skip to content

feat(ibis): Athena default credential chain authentication support#1362

Merged
goldmedal merged 5 commits intoCanner:mainfrom
douenergy:athena-default-credencial-chain
Nov 17, 2025
Merged

feat(ibis): Athena default credential chain authentication support#1362
goldmedal merged 5 commits intoCanner:mainfrom
douenergy:athena-default-credencial-chain

Conversation

@douenergy
Copy link
Copy Markdown
Contributor

@douenergy douenergy commented Nov 2, 2025

Previously, users were required to explicitly provide aws_access_key_id and aws_secret_access_key in the connection info.

Now, the connector can authenticate using any of the following methods supported by PyAthena / boto3:

Default AWS credential chain (e.g., EC2/ECS IAM roles, ~/.aws/credentials, environment variables)

  • Static credentials (aws_access_key_id, aws_secret_access_key)

  • Web Identity / OIDC federation via AssumeRoleWithWebIdentity

OIDC connection info example

{
  "s3_staging_dir": "s3://my-bucket/athena-staging/",
  "region_name": "us-east-1",
  "schema_name": "default",
  "role_arn": "arn:aws:iam::123456789012:role/google-oidc-athena",
  "role_session_name": "my-oidc-session",
  "web_identity_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
}

Generate a Web Identity Token from gcloud cli

gcloud auth print-identity-token \
  --audiences=https://sts.amazonaws.com \
  > google-jwt.json

PyAthena credential doc

Summary by CodeRabbit

  • New Features

    • Athena connections now support OIDC web‑identity role assumption, optional AWS session tokens and role session names, and optional region/schema with sensible defaults; falls back to the default AWS credential chain when explicit credentials aren’t provided.
  • Tests

    • Added tests covering multiple Athena authentication modes and updated query expectations to use higher‑precision decimals and adjusted timestamp representation.

@github-actions github-actions bot added ibis python Pull requests that update Python code labels Nov 2, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 2, 2025

Walkthrough

AthenaConnectionInfo makes AWS credential and role/web‑identity fields optional and adds new fields. get_athena_connection builds credentials dynamically (web‑identity STS, static keys, or default chain) and calls ibis. Tests add OIDC/default‑credential fixtures, a parameterized Athena auth‑modes test, and update decimal/timestamp expectations.

Changes

Cohort / File(s) Summary
Model: AthenaConnectionInfo
ibis-server/app/model/__init__.py
Made AWS credential fields optional (aws_access_key_id, aws_secret_access_key, aws_session_token) and added optional web_identity_token, role_arn, role_session_name; made region_name and schema_name optional with defaults; updated field descriptions.
Connection builder
ibis-server/app/model/data_source.py
get_athena_connection now conditionally assembles kwargs: always includes s3_staging_dir and schema_name; includes region_name if present; supports AssumeRoleWithWebIdentity flow (calls STS, injects temp creds), static keys (+ optional session token), or omits creds to use default AWS credential chain; calls ibis.athena.connect(**kwargs).
Tests: Athena fixtures & tests
ibis-server/tests/routers/v3/connector/athena/conftest.py, ibis-server/tests/routers/v3/connector/athena/test_query.py, ibis-server/tests/routers/v2/connector/test_athena.py
Added fixtures connection_info_default_credential_chain and connection_info_oidc; added test_query_athena_modes exercising three auth modes; updated expected dtype for totalprice to decimal128(38, 9); adjusted timestamptz expression in v2 manifest tests.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Test / API Client
    participant Builder as get_athena_connection
    participant STS as AWS STS
    participant Ibis as ibis.athena.connect

    Client->>Builder: provide AthenaConnectionInfo
    alt web_identity_token & role_arn present
        Builder->>STS: AssumeRoleWithWebIdentity(role_arn, web_identity_token, role_session_name?)
        STS-->>Builder: temp creds (access_key, secret, session_token)
        Note right of Builder: include temp creds in kwargs
    else aws_access_key_id & aws_secret_access_key present
        Builder->>Builder: include static access_key, secret, aws_session_token?
    else
        Builder->>Builder: omit credentials (use default AWS chain)
    end
    Builder->>Builder: include s3_staging_dir, schema_name, region_name?
    Builder->>Ibis: connect(**kwargs)
    Ibis-->>Builder: connection
    Builder-->>Client: return connection
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Focus areas:
    • Credential‑flow prioritization and exclusivity in get_athena_connection
    • Safe unwrapping/handling of SecretStr | None
    • STS AssumeRoleWithWebIdentity usage and region handling
    • Test fixture environment skip logic and updated dtype expectations

Possibly related PRs

Suggested labels

athena

Suggested reviewers

  • goldmedal

Poem

🐰 I found a token, soft and bright,
I hopped through roles into the night,
Keys optional, defaults hum,
Queries run — the rows come,
I nibble tests and dance in light.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat(ibis): Athena default credential chain authentication support' clearly and concisely summarizes the main change: adding support for default AWS credential chain authentication in Athena connections.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
ibis-server/app/model/data_source.py (1)

269-273: Consider validating that role_arn is required when web_identity_token is provided.

While PyAthena may handle this validation internally, it would be better to fail fast with a clear error message if web_identity_token is provided without role_arn.

Add validation before the connection attempt:

# ── 1️⃣ Web Identity Token flow (Google OIDC → AWS STS) ───
if info.web_identity_token and info.role_arn:
    kwargs["web_identity_token"] = info.web_identity_token.get_secret_value()
    kwargs["role_arn"] = info.role_arn.get_secret_value()
    if info.role_session_name:
        kwargs["role_session_name"] = info.role_session_name.get_secret_value()
+elif info.web_identity_token or info.role_arn:
+    raise WrenError(
+        ErrorCode.INVALID_CONNECTION_INFO,
+        "Both web_identity_token and role_arn must be provided together for OIDC authentication"
+    )
ibis-server/app/model/__init__.py (1)

119-153: Consider adding validation for mutually exclusive authentication methods.

While the current implementation allows any combination of authentication fields, it might be clearer to validate that only one authentication method is used at a time. However, this could also be deferred to runtime in the connection logic, which may provide better error messages with context.

If you want to enforce validation at the model level, you could add a Pydantic validator:

from pydantic import model_validator

class AthenaConnectionInfo(BaseConnectionInfo):
    # ... existing fields ...
    
    @model_validator(mode='after')
    def validate_auth_method(self):
        """Ensure only one authentication method is provided."""
        has_static = bool(self.aws_access_key_id and self.aws_secret_access_key)
        has_oidc = bool(self.web_identity_token and self.role_arn)
        
        if has_static and has_oidc:
            raise ValueError(
                "Cannot provide both static credentials and web identity token authentication"
            )
        
        if self.web_identity_token and not self.role_arn:
            raise ValueError(
                "role_arn is required when web_identity_token is provided"
            )
        
        if self.role_arn and not self.web_identity_token:
            raise ValueError(
                "web_identity_token is required when role_arn is provided"
            )
        
        return self

This is optional since the connection logic already handles these cases gracefully.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae298f4 and 508f1c0.

📒 Files selected for processing (4)
  • ibis-server/app/model/__init__.py (1 hunks)
  • ibis-server/app/model/data_source.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/athena/conftest.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/athena/test_query.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py (2)
ibis-server/tests/conftest.py (1)
  • client (18-23)
ibis-server/tests/routers/v3/connector/athena/test_functions.py (1)
  • manifest_str (31-32)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
🔇 Additional comments (2)
ibis-server/app/model/data_source.py (1)

250-291: Well-structured authentication flow implementation.

The refactored connection logic correctly implements three authentication methods with proper fallback behavior. The conditional structure ensures mutually exclusive credential flows, and all optional parameters are handled appropriately.

ibis-server/app/model/__init__.py (1)

112-166: Well-documented model with comprehensive authentication support.

The refactored AthenaConnectionInfo model provides clear support for all three authentication methods with excellent documentation. The optional fields and defaults are appropriate for the flexible authentication flows.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
ibis-server/app/model/__init__.py (1)

136-153: Consider validating web identity credential requirements.

The web identity fields are well-documented. However, web_identity_token requires role_arn to function correctly for AssumeRoleWithWebIdentity authentication. Consider adding validation to ensure they're provided together.

This can be added to the same @model_validator suggested for static credentials:

@model_validator(mode='after')
def validate_credential_pairs(self):
    # Validate web identity credentials
    if self.web_identity_token is not None and self.role_arn is None:
        raise ValueError(
            "role_arn must be provided when using web_identity_token"
        )
    
    return self
ibis-server/tests/routers/v3/connector/athena/test_query.py (1)

216-233: Fixture exists; remove verification concern but retain assertion improvement suggestion.

The connection_info_default_credential_chain fixture is properly defined at ibis-server/tests/routers/v3/connector/athena/conftest.py:35. The original concern about a missing fixture is resolved.

The test parametrization and fixture retrieval are correct. However, the assertion quality suggestion remains valid—the test only verifies response structure without validating actual query results or data correctness.

     assert response.status_code == 200
     result = response.json()
     assert "columns" in result
     assert "data" in result
+    assert len(result["columns"]) > 0
+    assert len(result["data"]) > 0
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 508f1c0 and 8aa3751.

📒 Files selected for processing (5)
  • ibis-server/app/model/__init__.py (1 hunks)
  • ibis-server/app/model/data_source.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_athena.py (3 hunks)
  • ibis-server/tests/routers/v3/connector/athena/conftest.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/athena/test_query.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • ibis-server/app/model/data_source.py
  • ibis-server/tests/routers/v3/connector/athena/conftest.py
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py (2)
ibis-server/tests/conftest.py (1)
  • client (18-23)
ibis-server/tests/routers/v3/connector/athena/conftest.py (1)
  • connection_info (24-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
🔇 Additional comments (5)
ibis-server/tests/routers/v2/connector/test_athena.py (2)

62-62: LGTM: Test expectation updated to match new timestamp expression format.

The change from a bare timestamp literal to a CAST expression aligns with how Athena now represents timestamp values in the manifest.


116-116: The decimal precision change is intentional and backed by application logic, but needs manual verification against live Athena.

The totalprice dtype change from decimal128(15, 2) to decimal128(38, 9) is not a test-only modification. It's driven by the _round_decimal_columns method in ibis-server/app/model/connector.py (lines 210-217), which hardcodes precision to 38 (PyArrow's maximum) and defaults scale to 9 for all Decimal types.

This means every decimal column returned by Athena queries will be cast to decimal128(38, 9). The data values remain correct. However, verify that this aligns with the actual behavior of the Canner Ibis fork (canner/10.8.1) and current Athena type inference, as this precision appears to differ from what was previously expected.

ibis-server/tests/routers/v3/connector/athena/test_query.py (1)

93-93: LGTM: Decimal precision update consistent with v2 tests.

The totalprice dtype update to decimal128(38, 9) aligns with the same change in v2 tests, reflecting Athena's default decimal precision.

ibis-server/app/model/__init__.py (2)

161-166: Verify the suggested default value approach—Pydantic v2 strict typing may reject plain strings.

The current implementation default=SecretStr("default") works but is not idiomatic for Pydantic v2. However, the review's suggestion to use default="default" may fail type validation since the field is typed as SecretStr | None and Pydantic v2 has strict type checking.

The recommended approach for Pydantic v2 is to use a default factory instead:

     schema_name: SecretStr | None = Field(
         alias="schema_name",
         description="The database name in Athena. Defaults to 'default'.",
         examples=["default"],
-        default=SecretStr("default"),
+        default_factory=lambda: SecretStr("default"),
     )

This is idiomatic for Pydantic v2 and avoids potential type coercion issues. However, verify that:

  1. The plain string default suggested in the review actually works with your Pydantic v2 configuration
  2. All code paths that rely on schema_name being a SecretStr (not a plain string) function correctly

119-133: No changes needed. The code already validates credential pairing correctly.

The review comment assumes no credential pairing validation exists, but the codebase already implements this on line 268 of ibis-server/app/model/data_source.py: the elif condition uses and, requiring both aws_access_key_id and aws_secret_access_key to be truthy. If only one is provided, the condition fails and the code gracefully falls back to the default AWS credential chain (environment variables, ~/.aws/credentials, or IAM role), which is the intended behavior per the code comments (lines 277-280).

This design is correct: it prevents incomplete static credentials from being used, yet remains flexible for cases where credentials are provided through multiple sources (model fields, environment variables, files).

Likely an incorrect or invalid review comment.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py (1)

223-229: Prefer base_url over a hard-coded path.

We already import base_url, so duplicating /v3/connector/athena here will drift if the route changes. Please reuse the shared constant for consistency.

-    response = await client.post(
-        url="/v3/connector/athena/query",
+    response = await client.post(
+        url=f"{base_url}/query",
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8aa3751 and 9205547.

📒 Files selected for processing (1)
  • ibis-server/tests/routers/v3/connector/athena/test_query.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py (2)
ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)
  • manifest_str (137-138)
ibis-server/tests/routers/v3/connector/athena/conftest.py (1)
  • connection_info (24-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: ci
🔇 Additional comments (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py (1)

89-99: Updated decimal precision matches backend change.

The new expectation for decimal128(38, 9) keeps the test aligned with the Athena type mapping and looks good.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9205547 and 1e6a2fc.

📒 Files selected for processing (3)
  • ibis-server/app/model/data_source.py (2 hunks)
  • ibis-server/tests/routers/v3/connector/athena/conftest.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/athena/test_query.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py (3)
ibis-server/tests/conftest.py (1)
  • client (18-23)
ibis-server/tests/routers/v3/connector/athena/test_functions.py (1)
  • manifest_str (31-32)
ibis-server/tests/routers/v3/connector/athena/conftest.py (1)
  • connection_info (24-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: ci
  • GitHub Check: ci
🔇 Additional comments (6)
ibis-server/tests/routers/v3/connector/athena/test_query.py (2)

93-93: LGTM: Test expectation updated for higher precision decimal type.

The decimal precision update from decimal128(15, 2) to decimal128(38, 9) aligns with the changes in Athena connection handling and is consistently applied across test expectations.


216-262: LGTM: Parameterized test correctly exercises multiple authentication modes.

The test properly validates Athena connectivity across three authentication strategies (static credentials, default credential chain, and OIDC). All referenced fixtures are defined in conftest.py, and the test logic correctly mirrors the existing test_query function.

ibis-server/tests/routers/v3/connector/athena/conftest.py (2)

34-47: LGTM: Default credential chain fixture correctly validates environment setup.

The fixture appropriately checks for AWS credentials in the environment (to ensure the test can succeed) while intentionally omitting them from the connection info dict, allowing PyAthena to use the default credential chain resolution.


50-64: LGTM: OIDC fixture correctly configures web identity authentication.

The fixture properly reads OIDC-specific environment variables and provides all required fields (web_identity_token, role_arn) for the AssumeRoleWithWebIdentity flow. Using separate environment variable names for OIDC-specific configuration is a good practice.

ibis-server/app/model/data_source.py (2)

252-259: LGTM: Base connection parameters correctly initialized.

The kwargs dictionary properly initializes required fields (s3_staging_dir, schema_name) and conditionally adds the optional region_name, correctly using .get_secret_value() for SecretStr fields.


284-299: LGTM: Static credentials and default chain fallback correctly implemented.

The function properly handles static AWS credentials (with optional session token) and falls back to the default credential chain when neither OIDC nor static credentials are provided. The use of **kwargs for ibis.athena.connect is correct.

@douenergy douenergy requested a review from goldmedal November 14, 2025 02:35
@douenergy douenergy force-pushed the athena-default-credencial-chain branch from ef97a49 to 44655e3 Compare November 14, 2025 05:50
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
ibis-server/app/model/__init__.py (1)

116-157: AthenaConnectionInfo additions align with credential modes; fix region_name typing

The new optional credential fields (static keys, session token, web identity, role ARN/session name) and the updated schema_name default are well-structured and match the intended auth flows (static, default chain, OIDC).

One remaining nit:

  • Line 160–164: region_name is annotated as SecretStr but has default=None and the description says it’s optional. For consistency with its actual usage and with the other optional fields, this should be SecretStr | None.

This was already flagged in a previous review; updating the annotation would resolve the inconsistency between the type hint, default, and description.

Also applies to: 160-170

🧹 Nitpick comments (2)
ibis-server/tests/routers/v3/connector/athena/test_query.py (1)

216-262: Parametrized Athena auth‑modes test looks good; reuse base_url for consistency

The new test_query_athena_modes correctly exercises the three connection info variants and verifies both data and dtypes, giving good coverage across static, default‑chain, and OIDC flows.

Minor suggestion: you could use f"{base_url}/query" instead of the hardcoded "/v3/connector/athena/query" to keep the endpoint path centralized with the rest of the file.

ibis-server/tests/routers/v3/connector/athena/conftest.py (1)

34-64: New credential‑mode fixtures are consistent with AthenaConnectionInfo

  • connection_info_default_credential_chain correctly omits explicit AWS keys so the connector can rely on the default credential chain, while still skipping cleanly when AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY aren’t present.
  • connection_info_oidc matches the new web‑identity/role fields (web_identity_token, role_arn) and uses separate OIDC‑specific env vars, which keeps the modes clearly separated.

If you ever want to test non‑env default‑chain sources (e.g., instance profiles), you might relax the env‑var check and instead rely on the connector to fail with a clear error, but the current skip behavior is fine for CI.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef97a49 and 44655e3.

📒 Files selected for processing (5)
  • ibis-server/app/model/__init__.py (1 hunks)
  • ibis-server/app/model/data_source.py (2 hunks)
  • ibis-server/tests/routers/v2/connector/test_athena.py (3 hunks)
  • ibis-server/tests/routers/v3/connector/athena/conftest.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/athena/test_query.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ibis-server/app/model/data_source.py
🧰 Additional context used
🧬 Code graph analysis (1)
ibis-server/tests/routers/v3/connector/athena/test_query.py (2)
ibis-server/tests/routers/v3/connector/athena/test_functions.py (1)
  • manifest_str (31-32)
ibis-server/tests/routers/v3/connector/athena/conftest.py (1)
  • connection_info (24-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: ci
  • GitHub Check: ci
🔇 Additional comments (2)
ibis-server/tests/routers/v2/connector/test_athena.py (1)

61-64: timestamptz expression and widened decimal dtype look consistent

  • The explicit CAST(TIMESTAMP '2024-01-01 23:59:59 UTC' AS timestamp) keeps the same literal value while making the resulting type explicit, which should make type inference more predictable.
  • Updating totalprice to decimal128(38, 9) in both expectations aligns with the higher‑precision backend type and keeps v2 tests consistent with the v3 tests.

No functional issues spotted here.

Also applies to: 112-123, 152-163

ibis-server/tests/routers/v3/connector/athena/test_query.py (1)

89-99: Dtype update for totalprice matches backend precision

The widened totalprice expectation to decimal128(38, 9) is consistent with the other Athena tests and the higher‑precision backend type. No issues here.

Copy link
Copy Markdown
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @douenergy 👍

@goldmedal goldmedal merged commit f4e007f into Canner:main Nov 17, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants