feat(substitute): add default header to model substitute#1152
feat(substitute): add default header to model substitute#1152goldmedal merged 7 commits intoCanner:mainfrom
Conversation
WalkthroughThe changes introduce enhanced handling of catalog and schema resolution for model substitution in SQL queries. The Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API_Endpoint
participant ModelSubstitute
Client->>API_Endpoint: POST /model-substitute (SQL, headers)
API_Endpoint->>ModelSubstitute: Instantiate with data_source, manifest_str, headers (dict)
ModelSubstitute->>ModelSubstitute: Resolve catalog/schema from source or headers
ModelSubstitute-->>API_Endpoint: Substituted SQL
API_Endpoint-->>Client: Response (substituted SQL or error)
Poem
Tip ⚡💬 Agentic Chat (Pro Plan, General Availability)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
ibis-server/app/mdl/substitute.py (1)
48-56: Consider handling None values from headers.get() more robustlyThe implementation follows a clear precedence order for determining catalog and schema values. However, when headers is not None but doesn't contain the expected keys,
headers.get()will return None, which could lead to "None" strings in the model lookup key.if source.catalog and source.db: catalog = source.catalog schema = source.db elif self.headers is not None: - catalog = self.headers.get("x-user-catalog") - schema = self.headers.get("x-user-schema") + catalog = self.headers.get("x-user-catalog") or "" + schema = self.headers.get("x-user-schema") or "" else: catalog = "" schema = ""ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py (1)
33-215: Consider adding edge case tests for partial header information.The current tests cover the cases where both catalog and schema are provided either in SQL or via headers. Consider adding tests for edge cases such as:
- Only catalog provided in headers, schema missing
- Only schema provided in headers, catalog missing
- Empty or invalid header values
These tests would verify graceful handling of incomplete header information.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
ibis-server/app/mdl/substitute.py(2 hunks)ibis-server/app/routers/v2/connector.py(1 hunks)ibis-server/app/routers/v3/connector.py(4 hunks)ibis-server/tests/routers/v2/connector/test_postgres.py(6 hunks)ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py(6 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
ibis-server/app/routers/v2/connector.py (2)
ibis-server/app/util.py (1)
build_context(98-101)ibis-server/app/mdl/substitute.py (2)
ModelSubstitute(12-58)substitute(20-37)
ibis-server/tests/routers/v2/connector/test_postgres.py (2)
ibis-server/tests/conftest.py (1)
client(18-23)ibis-server/tests/routers/v3/connector/postgres/conftest.py (1)
connection_info(37-44)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci
🔇 Additional comments (22)
ibis-server/app/mdl/substitute.py (1)
13-17: Good addition of optional headers parameter with proper default valueThe changes to the
ModelSubstituteconstructor properly add support for HTTP headers with a sensible default value ofNone. The defensive copy usingdict(headers)ensures that the original headers object won't be modified accidentally.ibis-server/app/routers/v2/connector.py (1)
271-284: Added request parameter and passed headers correctlyThe changes correctly add a
request: Requestparameter to the function signature and properly pass the request headers to theModelSubstituteconstructor. This implementation allows the model substitution logic to utilize the catalog and schema values from HTTP headers.ibis-server/app/routers/v3/connector.py (3)
3-3: Good addition of Request importThe Request import from FastAPI has been correctly added to support accessing request headers.
261-272: Added request parameter and passed headers correctlyThe changes correctly add a
request: Requestparameter to the function signature and properly pass the request headers to theModelSubstituteconstructor, consistent with the implementation in the v2 router.
286-286: Correctly updated the v2 fallback to include the request parameterThe fallback to v2 connector has been properly updated to include the request parameter, maintaining consistency between the v3 and v2 implementations.
ibis-server/tests/routers/v2/connector/test_postgres.py (6)
26-26: Good update to test dataThe test manifest has been correctly updated to include a "catalog" field in the tableReference, which is essential for testing the new functionality.
714-743: Comprehensive test coverage for header-based catalog and schemaThe test case has been thoroughly updated to verify both scenarios:
- When catalog and schema are explicitly specified in the SQL query
- When catalog and schema are provided via HTTP headers
This comprehensive testing ensures the functionality works correctly in all expected use cases.
750-789: Thorough testing of CTE with both approachesThe test for model substitution with CTEs has been properly extended to test both scenarios:
- With catalog and schema in the SQL
- With catalog and schema in the headers
This ensures the model substitution works correctly with more complex SQL constructs.
796-833: Comprehensive subquery testing with both approachesThe test for model substitution with subqueries has been thoroughly extended to verify both scenarios:
- With catalog and schema in the SQL
- With catalog and schema in the headers
This ensures the model substitution works correctly with nested queries.
840-863: Good error handling test for out-of-scope modelsThe tests properly verify that appropriate error messages are returned when models are out of scope, regardless of whether catalog and schema are specified in SQL or via headers.
870-893: Thorough testing of error cases for non-existent columnsThe tests verify that appropriate error messages are returned for non-existent columns, both when using catalog and schema in SQL and when using headers.
ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py (11)
15-15: Good addition of the catalog field to the model definition.The explicit addition of
"catalog": "test"to the tableReference aligns with the new testing strategy that compares SQL queries with explicit catalog/schema qualifiers against those using HTTP headers.
33-46: Tests properly verify catalog/schema substitution in SQL queries.Good implementation of the test case to verify that explicit catalog/schema references in SQL are correctly substituted with manifest values. The test correctly asserts that
"test"."public"."orders"is transformed to"my_catalog"."my_schema"."Orders".
48-65: Headers-based substitution test is implemented correctly.This test case properly validates that providing catalog and schema via HTTP headers works correctly when the SQL query contains only the table name. The assertion confirms that the model is correctly resolved even without explicit catalog/schema in the SQL.
69-87: CTE test with explicit catalog/schema is well structured.The test correctly verifies model substitution within a Common Table Expression when using explicit catalog and schema qualifiers in the SQL.
89-111: Headers-based CTE test is implemented correctly.Good implementation of the test case with catalog/schema headers for CTEs. This verifies that model substitution works properly within CTEs when catalog/schema context is provided via headers.
115-132: Subquery test with explicit catalog/schema is well structured.The test correctly verifies model substitution within a subquery when using explicit catalog and schema qualifiers in the SQL.
134-155: Headers-based subquery test is implemented correctly.Good implementation of the test case with catalog/schema headers for subqueries. This verifies that model substitution works properly within subqueries when catalog/schema context is provided via headers.
159-170: Error case testing for out-of-scope models with explicit catalog/schema.The test properly verifies that an appropriate error is returned when a non-existent model is referenced in the SQL.
171-185: Error case testing for out-of-scope models with headers.The test correctly verifies that the same error behavior is exhibited when using catalog/schema headers with a non-existent model.
191-201: Error case testing for non-existent columns with explicit catalog/schema.The test properly verifies that an appropriate error is returned when a non-existent column is referenced in the SQL with explicit catalog/schema qualifiers.
202-215: Error case testing for non-existent columns with headers.The test correctly verifies that the same error behavior is exhibited when using catalog/schema headers with a non-existent column.
goldmedal
left a comment
There was a problem hiding this comment.
Thanks @douenergy. It looks nice. However, I think we need to enhance the tests. I left some comments.
By the way, I found another potential issue at _build_model_dict.
This method is only used for the full qualified table reference. If a table reference isn't qualified, the hash key would be wrong.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
ibis-server/app/dependencies.py (1)
13-14: Consider adding a docstring to the new dependency function.While the function is simple, adding a docstring would improve maintainability by clearly indicating:
- The purpose of this function as a FastAPI dependency
- That it's specifically designed to support header extraction for model substitution
- The notable headers it's intended to provide (
x-user-catalogandx-user-schema)def get_wren_headers(request: Request) -> Headers: + """ + Extract HTTP headers from the incoming request. + + Primarily used to access catalog and schema context headers + ('x-user-catalog' and 'x-user-schema') for model substitution. + + Args: + request: The FastAPI request object + + Returns: + Headers object containing all request headers + """ return request.headers
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
ibis-server/app/dependencies.py(1 hunks)ibis-server/app/routers/v2/connector.py(2 hunks)ibis-server/app/routers/v3/connector.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- ibis-server/app/routers/v2/connector.py
- ibis-server/app/routers/v3/connector.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci
🔇 Additional comments (1)
ibis-server/app/dependencies.py (1)
1-2: Appropriate imports added for new functionality.The added imports for
RequestfromfastapiandHeadersfromstarlette.datastructuresare necessary for the new header extraction functionality.
ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py
Show resolved
Hide resolved
|
Thanks @goldmedal for the review. I’ve switched to using |
ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
ibis-server/app/mdl/substitute.py (3)
13-13: Consider adding a more specific type hint for the headers parameterThe new
headersparameter would benefit from a more specific type hint to indicate what type of object is expected (e.g.,Headers,Dict[str, str], etc.).-def __init__(self, data_source: DataSource, manifest_str: str, headers=None): +def __init__(self, data_source: DataSource, manifest_str: str, headers: Optional[Dict[str, str]] = None):Don't forget to add the corresponding imports:
from typing import Dict, Optional
57-68: Consider simplifying the fallback logicThe header fallback logic is correct but could be simplified to reduce repetition.
-# Determine catalog -if source.catalog: - catalog = source.catalog -else: - catalog = self.headers.get("x-user-catalog", "") if self.headers else "" - -# Determine schema -if source.db: - schema = source.db -else: - schema = self.headers.get("x-user-schema", "") if self.headers else "" +# Determine catalog and schema with fallback to headers +catalog = source.catalog or (self.headers.get("x-user-catalog", "") if self.headers else "") +schema = source.db or (self.headers.get("x-user-schema", "") if self.headers else "")
13-17: Add a docstring to explain the headers parameterConsider adding a docstring to explain the purpose of the new headers parameter and how it affects model substitution.
def __init__(self, data_source: DataSource, manifest_str: str, headers=None): + """ + Initialize a ModelSubstitute instance. + + Args: + data_source: The data source type + manifest_str: Base64-encoded JSON string containing manifest data + headers: Optional HTTP headers containing 'x-user-catalog' and 'x-user-schema' + for fallback catalog and schema values when not specified in SQL + """ self.data_source = data_source self.manifest = base64_to_dict(manifest_str) self.model_dict = self._build_model_dict(self.manifest["models"]) self.headers = dict(headers) if headers else None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
ibis-server/app/mdl/substitute.py(2 hunks)ibis-server/app/routers/v2/connector.py(8 hunks)ibis-server/app/routers/v3/connector.py(8 hunks)ibis-server/tests/routers/v2/connector/test_postgres.py(6 hunks)ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py
- ibis-server/tests/routers/v2/connector/test_postgres.py
- ibis-server/app/routers/v2/connector.py
- ibis-server/app/routers/v3/connector.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
ibis-server/app/mdl/substitute.py (3)
ibis-server/app/model/data_source.py (1)
DataSource(42-67)ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py (1)
manifest_str(28-29)ibis-server/app/util.py (1)
base64_to_dict(18-19)
🔇 Additional comments (3)
ibis-server/app/mdl/substitute.py (3)
17-17: Implementation looks goodGood approach to convert the headers to a dictionary if provided, making it easier to access header values later.
43-52: Well-structured key generation logicThe refactored key generation logic correctly handles different table reference formats (fully qualified, partially qualified, and unqualified), addressing the past review comment about key generation.
70-70:Details
❓ Verification inconclusive
Potential key inconsistency with model_dict
The lookup key
f"{catalog}.{schema}.{table}"always uses the fully qualified format, but_build_model_dictcreates keys in different formats based on the table reference. This might lead to lookup failures when the source doesn't have both catalog and schema.Let's check for potential inconsistencies in key generation vs lookup:
Consider updating the lookup logic to match the key generation logic:
-return self.model_dict.get(f"{catalog}.{schema}.{table}", None) +# Try fully qualified key first +model = self.model_dict.get(f"{catalog}.{schema}.{table}", None) +if model is not None: + return model + +# Try partially qualified key +if schema: + model = self.model_dict.get(f"{schema}.{table}", None) + if model is not None: + return model + +# Try unqualified key +return self.model_dict.get(table, None)
🏁 Script executed:
#!/bin/bash # Find all instances of model_dict usage to verify key consistency rg -A 3 "model_dict\.get" ibis-server/app/mdl/ # Check for any unit tests that might validate this behavior rg -A 10 "ModelSubstitute" ibis-server/tests/Length of output: 330
Attention: Potential Key Generation Mismatch in Model Lookup
The lookup in
ibis-server/app/mdl/substitute.pyuses a fully qualified key (f"{catalog}.{schema}.{table}"), but our investigation suggests that model keys might be generated in various formats (fully qualified, partially qualified, or unqualified). The current tests did not reveal any unit tests confirming the intended behavior of key generation, so please verify whether the key formats in_build_model_dicttruly differ from the lookup format.Recommendations:
- Confirm the key-building logic in
_build_model_dictto check for inconsistent key formats.- Consider adopting a multi-step lookup approach, for example:
- return self.model_dict.get(f"{catalog}.{schema}.{table}", None) + model = self.model_dict.get(f"{catalog}.{schema}.{table}", None) + if model is not None: + return model + + if schema: + model = self.model_dict.get(f"{schema}.{table}", None) + if model is not None: + return model + + return self.model_dict.get(table, None)- Manually verify that this change aligns with the intended behavior and existing unit tests (if any) in the codebase.
Please double-check these changes to ensure that all key-query scenarios are covered and that any adjustments in key generation logic are reconciled.
This PR adds support for two optional HTTP header
x-user-catalogandx-user-schemain the following endpoints:POST
/v2/connector/{data_source}/model-substitutePOST
/v3/connector/{data_source}/model-substituteThese headers allow clients to explicitly specify the catalog and schema context when performing model substitution.
Summary by CodeRabbit
New Features
Bug Fixes
x-user-catalogorx-user-schema) is provided.Tests