feat(ibis): Add Athena connector #1207
Conversation
WalkthroughThe changes introduce full support for AWS Athena as a data source in the application. This includes new model classes, metadata handling, connection logic, and comprehensive tests. The CI workflow and project configuration are updated to enable Athena-specific testing and dependencies, ensuring Athena integration is properly validated and maintained. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API
participant AthenaMetadata
participant Athena (AWS)
participant S3
Client->>API: Send query/metadata request with Athena connection info
API->>AthenaMetadata: Instantiate with connection info
AthenaMetadata->>Athena: Connect using provided AWS credentials & S3 staging dir
AthenaMetadata->>Athena: Execute SQL (e.g., information_schema query)
Athena-->>AthenaMetadata: Return results
AthenaMetadata-->>API: Process and return structured response
API-->>Client: Respond with query/metadata results
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
ibis-server/app/model/__init__.py (1)
105-124: Consider removing redundant alias for schema_name field.The
AthenaConnectionInfoclass is well-structured with appropriate AWS/Athena-specific fields and proper use ofSecretStrfor security. However, theschema_namefield has a redundant alias that matches the field name.Consider this minor optimization:
- schema_name: SecretStr = Field( - alias="schema_name", - description="The database name in Athena", - examples=["default"], - ) + schema_name: SecretStr = Field( + description="The database name in Athena", + examples=["default"], + )🧰 Tools
🪛 Pylint (3.3.7)
[convention] 105-105: Missing class docstring
(C0115)
[refactor] 105-105: Too few public methods (1/2)
(R0903)
ibis-server/app/model/metadata/athena.py (2)
115-115: Update comment to reflect Athena instead of MySQL.The comment references MySQL documentation which is incorrect for Athena.
- # Numeric Types(https://dev.mysql.com/doc/refman/8.4/en/numeric-types.html) + # Numeric Types for Athena/Trino
131-131: Update comment to reflect Athena instead of MySQL.The comment references MySQL documentation which is incorrect for Athena.
- # Date and Time Types(https://dev.mysql.com/doc/refman/8.4/en/date-and-time-types.html) + # Date and Time Types for Athena/Trino
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
ibis-server/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
.github/workflows/ibis-ci.yml(1 hunks)ibis-server/app/model/__init__.py(3 hunks)ibis-server/app/model/data_source.py(5 hunks)ibis-server/app/model/metadata/athena.py(1 hunks)ibis-server/app/model/metadata/factory.py(2 hunks)ibis-server/pyproject.toml(2 hunks)ibis-server/tests/routers/v2/connector/test_athena.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
ibis-server/app/model/metadata/factory.py (2)
ibis-server/app/model/metadata/athena.py (1)
AthenaMetadata(17-139)ibis-server/app/model/data_source.py (1)
DataSource(44-70)
ibis-server/app/model/data_source.py (1)
ibis-server/app/model/__init__.py (2)
AthenaConnectionInfo(105-123)QueryAthenaDTO(39-40)
ibis-server/tests/routers/v2/connector/test_athena.py (2)
wren-core-base/manifest-macro/src/lib.rs (1)
manifest(26-56)ibis-server/tests/conftest.py (1)
client(18-23)
🪛 Pylint (3.3.7)
ibis-server/app/model/metadata/athena.py
[convention] 45-45: Line too long (103/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Unable to import 'pandas'
(E0401)
[convention] 17-17: Missing class docstring
(C0115)
ibis-server/app/model/__init__.py
[convention] 39-39: Missing class docstring
(C0115)
[refactor] 39-39: Too few public methods (0/2)
(R0903)
[convention] 105-105: Missing class docstring
(C0115)
[refactor] 105-105: Too few public methods (1/2)
(R0903)
ibis-server/app/model/data_source.py
[convention] 45-45: Class constant name "athena" doesn't conform to UPPER_CASE naming style
(C0103)
[convention] 74-74: Class constant name "athena" doesn't conform to UPPER_CASE naming style
(C0103)
[convention] 105-105: Missing function or method docstring
(C0116)
ibis-server/tests/routers/v2/connector/test_athena.py
[convention] 43-43: Line too long (105/100)
(C0301)
[warning] 107-107: ## fixme this should be float64
(W0511)
[convention] 1-1: Missing module docstring
(C0114)
[error] 4-4: Unable to import 'orjson'
(E0401)
[convention] 11-11: Constant name "base_url" doesn't conform to UPPER_CASE naming style
(C0103)
[convention] 73-73: Missing function or method docstring
(C0116)
[convention] 77-77: Missing function or method docstring
(C0116)
[warning] 77-77: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 117-117: Missing function or method docstring
(C0116)
[warning] 117-117: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 145-145: Missing function or method docstring
(C0116)
[convention] 161-161: Missing function or method docstring
(C0116)
[warning] 161-161: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 174-174: Missing function or method docstring
(C0116)
[warning] 174-174: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 190-190: Missing function or method docstring
(C0116)
[warning] 190-190: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 203-203: Missing function or method docstring
(C0116)
[warning] 203-203: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 217-217: Missing function or method docstring
(C0116)
[warning] 217-217: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 232-232: Missing function or method docstring
(C0116)
[warning] 232-232: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 244-244: Missing function or method docstring
(C0116)
[warning] 245-245: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 268-268: Missing function or method docstring
(C0116)
[warning] 268-268: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 281-281: Missing function or method docstring
(C0116)
[warning] 282-282: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 307-307: Missing function or method docstring
(C0116)
[convention] 331-331: Missing function or method docstring
(C0116)
[convention] 343-343: Missing function or method docstring
(C0116)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci
🔇 Additional comments (19)
ibis-server/app/model/metadata/factory.py (2)
2-2: LGTM: Clean import following established pattern.The import of
AthenaMetadatafollows the same pattern as other metadata imports and maintains alphabetical ordering.
21-21: LGTM: Consistent factory mapping implementation.The mapping entry correctly associates
DataSource.athenawithAthenaMetadata, following the established factory pattern used by other data sources.ibis-server/pyproject.toml (2)
14-14: LGTM: Correct dependency configuration for Athena support.Adding "athena" to the ibis-framework extras properly enables the Athena backend support, following the same pattern as other database backends.
70-70: LGTM: Consistent pytest marker addition.The athena test marker follows the established pattern and enables selective test execution for Athena-specific tests.
.github/workflows/ibis-ci.yml (1)
93-100: LGTM: Well-structured conditional Athena testing step.The implementation correctly follows the established pattern used by BigQuery and Snowflake conditional tests. The environment variables appropriately map to AWS credentials and Athena-specific configuration needed for testing.
ibis-server/app/model/__init__.py (2)
39-41: LGTM: Consistent DTO implementation.The
QueryAthenaDTOclass correctly follows the established pattern used by other database-specific query DTOs, properly typing the connection_info field.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 39-39: Missing class docstring
(C0115)
[refactor] 39-39: Too few public methods (0/2)
(R0903)
367-368: LGTM: Proper union type extension.The addition of
AthenaConnectionInfoto theConnectionInfounion type is correctly placed and maintains consistency with the existing type union structure.ibis-server/app/model/data_source.py (3)
14-14: LGTM! Imports are correctly added.The imports for
AthenaConnectionInfoandQueryAthenaDTOare properly included and follow the existing import pattern.Also applies to: 23-23
45-45: LGTM! Enum additions follow the established pattern.The
athenaenum members are correctly added to bothDataSourceandDataSourceExtensionenums, maintaining consistency with other data sources.Also applies to: 74-74
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 45-45: Class constant name "athena" doesn't conform to UPPER_CASE naming style
(C0103)
104-112: LGTM! Connection method implementation is correct.The
get_athena_connectionmethod properly:
- Follows the established pattern for static connection methods
- Correctly extracts secret values from the connection info
- Uses the appropriate
ibis.athena.connectparametersThe implementation is consistent with other connector methods in the file.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 105-105: Missing function or method docstring
(C0116)
ibis-server/app/model/metadata/athena.py (3)
17-21: LGTM! Class structure follows the established pattern.The
AthenaMetadataclass properly inherits fromMetadataand initializes the connection using the data source management system.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 17-17: Missing class docstring
(C0115)
93-95: LGTM! Constraint handling is correctly implemented.Properly returns an empty list since Athena doesn't support foreign key constraints, with clear documentation.
97-98: LGTM! Version method provides appropriate information.Returns a descriptive version string that accurately reflects AWS Athena's versioning approach.
ibis-server/tests/routers/v2/connector/test_athena.py (6)
9-19: LGTM! Test configuration is well-structured.The test setup properly:
- Uses pytest markers for test categorization
- Configures environment-based connection info with sensible defaults
- Follows established patterns for test configuration
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 11-11: Constant name "base_url" doesn't conform to UPPER_CASE naming style
(C0103)
20-69: LGTM! Comprehensive test manifest structure.The manifest provides excellent test coverage with:
- Various column types (integer, varchar, float, date, timestamp, bytea)
- Complex expressions and type casting
- Edge cases like NULL values
- Realistic test data structure
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 43-43: Line too long (105/100)
(C0301)
77-102: LGTM! Thorough query result validation.The test properly validates:
- Response structure and status code
- Data consistency with expected results
- Column count matches manifest
- Data types are correctly returned
The assertion structure is comprehensive and reliable.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 77-77: Missing function or method docstring
(C0116)
[warning] 77-77: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
117-143: LGTM! Limit parameter testing is well-implemented.The test effectively validates that the limit parameter works correctly in both scenarios:
- When applied via query parameter
- When SQL already contains a LIMIT clause (parameter takes precedence)
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 117-117: Missing function or method docstring
(C0116)
[warning] 117-117: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
145-305: LGTM! Comprehensive error scenario testing.Excellent coverage of error cases:
- Missing required fields (manifestStr, sql, connectionInfo)
- Invalid parameters for validation rules
- Proper HTTP status codes and error messages
This ensures robust error handling in the API.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 145-145: Missing function or method docstring
(C0116)
[convention] 161-161: Missing function or method docstring
(C0116)
[warning] 161-161: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 174-174: Missing function or method docstring
(C0116)
[warning] 174-174: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 190-190: Missing function or method docstring
(C0116)
[warning] 190-190: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 203-203: Missing function or method docstring
(C0116)
[warning] 203-203: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 217-217: Missing function or method docstring
(C0116)
[warning] 217-217: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 232-232: Missing function or method docstring
(C0116)
[warning] 232-232: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 244-244: Missing function or method docstring
(C0116)
[warning] 245-245: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 268-268: Missing function or method docstring
(C0116)
[warning] 268-268: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
[convention] 281-281: Missing function or method docstring
(C0116)
[warning] 282-282: Redefining name 'manifest_str' from outer scope (line 73)
(W0621)
307-351: LGTM! Metadata endpoint testing is comprehensive.The tests properly validate:
- Table listing functionality
- Constraint handling (correctly expects empty list for Athena)
- Version information retrieval
- Response structure validation
Good coverage of all metadata endpoints.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 307-307: Missing function or method docstring
(C0116)
[convention] 331-331: Missing function or method docstring
(C0116)
[convention] 343-343: Missing function or method docstring
(C0116)
goldmedal
left a comment
There was a problem hiding this comment.
Thanks, @douenergy. Overall, it looks good to me. I left some minor comments. It's better to add the test case for the glue table.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
ibis-server/app/model/metadata/athena.py (1)
25-43: SQL injection vulnerability still exists despite previous discussion.The direct string interpolation of
schema_nameremains a security concern. Since the previous parameterized approach didn't work with Ibis raw_sql, implement proper identifier validation and quoting as discussed in the previous review.Apply the validation and quoting approach:
+ def _validate_and_quote_identifier(self, identifier: str) -> str: + """Validate and quote SQL identifier to prevent injection.""" + import re + + # Remove any existing quotes and validate + clean_identifier = identifier.strip().strip('"').strip("'") + + # Only allow alphanumeric characters, underscores, and hyphens + if not re.match(r'^[a-zA-Z0-9_-]+$', clean_identifier): + raise ValueError(f"Invalid schema name: {clean_identifier}") + + # Quote the identifier using double quotes (standard SQL) + return f'"{clean_identifier}"' def get_table_list(self) -> list[Table]: schema_name = self.connection_info.schema_name.get_secret_value() + quoted_schema = self._validate_and_quote_identifier(schema_name) sql = f""" SELECT t.table_catalog, t.table_schema, t.table_name, c.column_name, c.ordinal_position, c.is_nullable, c.data_type FROM information_schema.tables AS t JOIN information_schema.columns AS c ON t.table_catalog = c.table_catalog AND t.table_schema = c.table_schema AND t.table_name = c.table_name - WHERE t.table_schema = '{schema_name}' + WHERE t.table_schema = {quoted_schema} ORDER BY t.table_name """
🧹 Nitpick comments (4)
ibis-server/app/model/metadata/athena.py (3)
1-14: Add module docstring for better documentation.Consider adding a module-level docstring to describe the purpose of this Athena metadata implementation.
+"""AWS Athena metadata implementation for retrieving tables, columns, and constraints.""" import re import pandas as pd🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Unable to import 'pandas'
(E0401)
17-21: Add class docstring for better documentation.The class would benefit from a docstring explaining its purpose and usage.
class AthenaMetadata(Metadata): + """Metadata implementation for AWS Athena data source. + + Provides functionality to retrieve table definitions, column information, + and constraints from Athena's information_schema. + """ def __init__(self, connection_info: AthenaConnectionInfo):🧰 Tools
🪛 Pylint (3.3.7)
[convention] 17-17: Missing class docstring
(C0115)
103-137: Consider improving type mapping coverage and organization.The type mapping includes some MySQL-specific types that may not be relevant for Athena. Consider organizing by category and ensuring Athena-specific types are properly covered.
def _transform_column_type(self, data_type): data_type = re.sub(r"\(.*\)", "", data_type).strip() switcher = { - # String Types (ignore Binary and Spatial Types for now) + # String Types "char": RustWrenEngineColumnType.CHAR, "varchar": RustWrenEngineColumnType.VARCHAR, - "tinytext": RustWrenEngineColumnType.TEXT, "text": RustWrenEngineColumnType.TEXT, - "mediumtext": RustWrenEngineColumnType.TEXT, - "longtext": RustWrenEngineColumnType.TEXT, - "enum": RustWrenEngineColumnType.VARCHAR, - "set": RustWrenEngineColumnType.VARCHAR, - "bit": RustWrenEngineColumnType.TINYINT, + "string": RustWrenEngineColumnType.VARCHAR, # Athena string type + + # Numeric Types "tinyint": RustWrenEngineColumnType.TINYINT, "smallint": RustWrenEngineColumnType.SMALLINT, - "mediumint": RustWrenEngineColumnType.INTEGER, "int": RustWrenEngineColumnType.INTEGER, "integer": RustWrenEngineColumnType.INTEGER, "bigint": RustWrenEngineColumnType.BIGINT, - # boolean + + # Boolean "bool": RustWrenEngineColumnType.BOOL, "boolean": RustWrenEngineColumnType.BOOL, - # Decimal + + # Decimal/Float "float": RustWrenEngineColumnType.FLOAT4, "double": RustWrenEngineColumnType.DOUBLE, "decimal": RustWrenEngineColumnType.DECIMAL, "numeric": RustWrenEngineColumnType.NUMERIC, + + # Date/Time "date": RustWrenEngineColumnType.DATE, "datetime": RustWrenEngineColumnType.TIMESTAMP, "timestamp": RustWrenEngineColumnType.TIMESTAMPTZ, - # JSON Type + + # JSON "json": RustWrenEngineColumnType.JSON, + + # Binary + "binary": RustWrenEngineColumnType.BYTEA, + "varbinary": RustWrenEngineColumnType.BYTEA, }ibis-server/tests/routers/v2/connector/test_athena.py (1)
1-9: Add module docstring to describe test purpose.Consider adding a module-level docstring to describe what this test module covers.
+"""Tests for Athena connector v2 API endpoints. + +Tests cover query execution, validation rules, and metadata retrieval +for both standard Athena and Glue-based databases. +""" import base64 import os🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
[error] 4-4: Unable to import 'orjson'
(E0401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
ibis-server/app/model/metadata/athena.py(1 hunks)ibis-server/tests/routers/v2/connector/test_athena.py(1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
ibis-server/app/model/metadata/athena.py
[convention] 45-45: Line too long (103/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 3-3: Unable to import 'pandas'
(E0401)
[convention] 17-17: Missing class docstring
(C0115)
ibis-server/tests/routers/v2/connector/test_athena.py
[convention] 54-54: Line too long (105/100)
(C0301)
[warning] 118-118: ## fixme this should be float64
(W0511)
[warning] 158-158: ## fixme this should be float64
(W0511)
[convention] 1-1: Missing module docstring
(C0114)
[error] 4-4: Unable to import 'orjson'
(E0401)
[convention] 11-11: Constant name "base_url" doesn't conform to UPPER_CASE naming style
(C0103)
[convention] 84-84: Missing function or method docstring
(C0116)
[convention] 88-88: Missing function or method docstring
(C0116)
[warning] 88-88: Redefining name 'manifest_str' from outer scope (line 84)
(W0621)
[convention] 128-128: Missing function or method docstring
(C0116)
[warning] 128-128: Redefining name 'manifest_str' from outer scope (line 84)
(W0621)
[convention] 168-168: Missing function or method docstring
(C0116)
[warning] 168-168: Redefining name 'manifest_str' from outer scope (line 84)
(W0621)
[convention] 196-196: Missing function or method docstring
(C0116)
[convention] 212-212: Missing function or method docstring
(C0116)
[warning] 212-212: Redefining name 'manifest_str' from outer scope (line 84)
(W0621)
[convention] 225-225: Missing function or method docstring
(C0116)
[warning] 225-225: Redefining name 'manifest_str' from outer scope (line 84)
(W0621)
[convention] 241-241: Missing function or method docstring
(C0116)
[warning] 241-241: Redefining name 'manifest_str' from outer scope (line 84)
(W0621)
[convention] 254-254: Missing function or method docstring
(C0116)
[warning] 254-254: Redefining name 'manifest_str' from outer scope (line 84)
(W0621)
[convention] 268-268: Missing function or method docstring
(C0116)
[warning] 268-268: Redefining name 'manifest_str' from outer scope (line 84)
(W0621)
[convention] 283-283: Missing function or method docstring
(C0116)
[warning] 283-283: Redefining name 'manifest_str' from outer scope (line 84)
(W0621)
[convention] 295-295: Missing function or method docstring
(C0116)
[warning] 296-296: Redefining name 'manifest_str' from outer scope (line 84)
(W0621)
[convention] 319-319: Missing function or method docstring
(C0116)
[warning] 319-319: Redefining name 'manifest_str' from outer scope (line 84)
(W0621)
[convention] 332-332: Missing function or method docstring
(C0116)
[warning] 333-333: Redefining name 'manifest_str' from outer scope (line 84)
(W0621)
[convention] 358-358: Missing function or method docstring
(C0116)
[convention] 382-382: Missing function or method docstring
(C0116)
[convention] 394-394: Missing function or method docstring
(C0116)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci
🔇 Additional comments (4)
ibis-server/tests/routers/v2/connector/test_athena.py (4)
88-126: Comprehensive query test with good data validation.The test effectively validates query execution, column count, data content, and data types. The assertions are thorough and check both structure and content.
🧰 Tools
🪛 Pylint (3.3.7)
[warning] 118-118: ## fixme this should be float64
(W0511)
[convention] 88-88: Missing function or method docstring
(C0116)
[warning] 88-88: Redefining name 'manifest_str' from outer scope (line 84)
(W0621)
128-166: Excellent addition of Glue database testing.Testing with Glue-based databases ensures the connector works with different Athena deployment patterns. The test structure mirrors the standard test appropriately.
🧰 Tools
🪛 Pylint (3.3.7)
[warning] 158-158: ## fixme this should be float64
(W0511)
[convention] 128-128: Missing function or method docstring
(C0116)
[warning] 128-128: Redefining name 'manifest_str' from outer scope (line 84)
(W0621)
358-380: Robust metadata table listing test.The test properly validates the metadata response structure and includes conditional logic to check for the test table's existence, which makes it resilient to different test environments.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 358-358: Missing function or method docstring
(C0116)
382-392: Correct validation of Athena constraints limitation.The test correctly validates that Athena returns an empty constraints list, which aligns with Athena's lack of foreign key constraint support.
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 382-382: Missing function or method docstring
(C0116)
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
ibis-server/app/model/metadata/athena.py (1)
26-44: The SQL injection vulnerability remains unaddressed.Based on the previous discussion, parameterized queries don't work with Ibis
raw_sql. The validation and quoting approach suggested in the follow-up was not implemented.Consider implementing the validation approach:
def _validate_and_quote_identifier(self, identifier: str) -> str: """Validate and quote SQL identifier to prevent injection.""" import re clean_identifier = identifier.strip().strip('"').strip("'") if not re.match(r'^[a-zA-Z0-9_-]+$', clean_identifier): raise ValueError(f"Invalid schema name: {clean_identifier}") return f'"{clean_identifier}"'Then use:
WHERE t.table_schema = {self._validate_and_quote_identifier(schema_name)}
🧹 Nitpick comments (3)
ibis-server/app/model/metadata/athena.py (3)
1-16: Add module and class documentation.The static analysis correctly identifies missing docstrings. Add proper documentation to improve code maintainability.
+""" +AWS Athena metadata implementation for retrieving schema information. + +This module provides AthenaMetadata class that connects to AWS Athena +and retrieves table and column metadata using information_schema queries. +""" import re from contextlib import closing import pandas as pd🧰 Tools
🪛 Pylint (3.3.7)
[convention] 1-1: Missing module docstring
(C0114)
[error] 4-4: Unable to import 'pandas'
(E0401)
18-22: Add class documentation.class AthenaMetadata(Metadata): + """ + Metadata implementation for AWS Athena data source. + + Provides methods to retrieve table schemas, columns, and constraints + from Athena using information_schema queries. + """ def __init__(self, connection_info: AthenaConnectionInfo):🧰 Tools
🪛 Pylint (3.3.7)
[convention] 18-18: Missing class docstring
(C0115)
46-47: Fix line length formatting issue.- # We need to use raw_sql here because using the sql method causes Ibis to *create view* first, - # which does not work with information_schema queries. + # We need to use raw_sql here because using the sql method causes Ibis to + # *create view* first, which does not work with information_schema queries.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 46-46: Line too long (103/100)
(C0301)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
ibis-server/app/model/connector.py(3 hunks)ibis-server/app/model/metadata/athena.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- ibis-server/app/model/connector.py
🧰 Additional context used
🪛 Pylint (3.3.7)
ibis-server/app/model/metadata/athena.py
[convention] 46-46: Line too long (103/100)
(C0301)
[convention] 1-1: Missing module docstring
(C0114)
[error] 4-4: Unable to import 'pandas'
(E0401)
[convention] 18-18: Missing class docstring
(C0115)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci
🔇 Additional comments (1)
ibis-server/app/model/metadata/athena.py (1)
48-51: Good implementation of cursor management.The use of
closing()context manager properly addresses the resource management concern from previous reviews.
Description
Summary by CodeRabbit
New Features
Tests
Chores