Skip to content

feat(substitute): add default header to model substitute#1152

Merged
goldmedal merged 7 commits intoCanner:mainfrom
douenergy:defaul-catalog-and-default-sechma
Apr 17, 2025
Merged

feat(substitute): add default header to model substitute#1152
goldmedal merged 7 commits intoCanner:mainfrom
douenergy:defaul-catalog-and-default-sechma

Conversation

@douenergy
Copy link
Copy Markdown
Contributor

@douenergy douenergy commented Apr 16, 2025

This PR adds support for two optional HTTP header x-user-catalog and x-user-schema in the following endpoints:

POST /v2/connector/{data_source}/model-substitute

POST /v3/connector/{data_source}/model-substitute

These headers allow clients to explicitly specify the catalog and schema context when performing model substitution.

Summary by CodeRabbit

  • New Features

    • Added support for specifying catalog and schema via HTTP headers across multiple endpoints, including model substitution.
  • Bug Fixes

    • Ensured consistent model substitution behavior regardless of whether catalog and schema are included in SQL queries or provided via headers.
    • Improved error handling when only one of the required headers (x-user-catalog or x-user-schema) is provided.
  • Tests

    • Expanded test coverage to validate model substitution with catalog and schema in both SQL and headers, covering success and error cases.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 16, 2025

Walkthrough

The changes introduce enhanced handling of catalog and schema resolution for model substitution in SQL queries. The ModelSubstitute class and its usage in API endpoints were updated to accept and utilize HTTP request headers for catalog and schema values, falling back to these headers if not explicitly provided in the SQL or manifest. API endpoint signatures in both v2 and v3 routers now include a dependency to inject headers as a Headers object. A new dependency function was added to extract headers from the request. Corresponding test suites were expanded to validate correct substitution behavior when catalog and schema are supplied either in the SQL or via headers, ensuring consistent and accurate model resolution.

Changes

File(s) Change Summary
ibis-server/app/mdl/substitute.py Updated ModelSubstitute constructor to accept optional headers dictionary. Modified _build_model_dict to generate fully qualified keys including catalog and schema when present. Changed _find_model to resolve catalog and schema from source or headers, defaulting to empty strings.
ibis-server/app/routers/v2/connector.py,
ibis-server/app/routers/v3/connector.py
Updated endpoint function signatures to replace string header parameter with dependency-injected Headers object. Modified instantiation of ModelSubstitute to pass headers as argument. In v3, reordered fallback call to v2 to align with new parameter order. Added imports for Headers and get_wren_headers dependency.
ibis-server/app/dependencies.py Added get_wren_headers function to extract and return HTTP headers from FastAPI Request as a Headers object.
ibis-server/tests/routers/v2/connector/test_postgres.py,
ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py
Enhanced tests to cover scenarios where catalog and schema are provided via SQL or HTTP headers. Updated manifests to include catalog in table reference. Added/clarified error assertions for out-of-scope models and non-existent columns.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API_Endpoint
    participant ModelSubstitute

    Client->>API_Endpoint: POST /model-substitute (SQL, headers)
    API_Endpoint->>ModelSubstitute: Instantiate with data_source, manifest_str, headers (dict)
    ModelSubstitute->>ModelSubstitute: Resolve catalog/schema from source or headers
    ModelSubstitute-->>API_Endpoint: Substituted SQL
    API_Endpoint-->>Client: Response (substituted SQL or error)
Loading

Poem

In the warren where models dwell,
Catalogs and schemas now resolve so well!
Whether in SQL or headers you hide,
The bunny will find you, far and wide.
Tests hop along, ensuring it's true—
Substitution works, no matter the view!
🐇✨

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added ibis python Pull requests that update Python code labels Apr 16, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
ibis-server/app/mdl/substitute.py (1)

48-56: Consider handling None values from headers.get() more robustly

The implementation follows a clear precedence order for determining catalog and schema values. However, when headers is not None but doesn't contain the expected keys, headers.get() will return None, which could lead to "None" strings in the model lookup key.

 if source.catalog and source.db:
     catalog = source.catalog
     schema = source.db
 elif self.headers is not None:
-    catalog = self.headers.get("x-user-catalog")
-    schema = self.headers.get("x-user-schema")
+    catalog = self.headers.get("x-user-catalog") or ""
+    schema = self.headers.get("x-user-schema") or ""
 else:
     catalog = ""
     schema = ""
ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py (1)

33-215: Consider adding edge case tests for partial header information.

The current tests cover the cases where both catalog and schema are provided either in SQL or via headers. Consider adding tests for edge cases such as:

  1. Only catalog provided in headers, schema missing
  2. Only schema provided in headers, catalog missing
  3. Empty or invalid header values

These tests would verify graceful handling of incomplete header information.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c984efb and 3334f9a.

📒 Files selected for processing (5)
  • ibis-server/app/mdl/substitute.py (2 hunks)
  • ibis-server/app/routers/v2/connector.py (1 hunks)
  • ibis-server/app/routers/v3/connector.py (4 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (6 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py (6 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
ibis-server/app/routers/v2/connector.py (2)
ibis-server/app/util.py (1)
  • build_context (98-101)
ibis-server/app/mdl/substitute.py (2)
  • ModelSubstitute (12-58)
  • substitute (20-37)
ibis-server/tests/routers/v2/connector/test_postgres.py (2)
ibis-server/tests/conftest.py (1)
  • client (18-23)
ibis-server/tests/routers/v3/connector/postgres/conftest.py (1)
  • connection_info (37-44)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci
🔇 Additional comments (22)
ibis-server/app/mdl/substitute.py (1)

13-17: Good addition of optional headers parameter with proper default value

The changes to the ModelSubstitute constructor properly add support for HTTP headers with a sensible default value of None. The defensive copy using dict(headers) ensures that the original headers object won't be modified accidentally.

ibis-server/app/routers/v2/connector.py (1)

271-284: Added request parameter and passed headers correctly

The changes correctly add a request: Request parameter to the function signature and properly pass the request headers to the ModelSubstitute constructor. This implementation allows the model substitution logic to utilize the catalog and schema values from HTTP headers.

ibis-server/app/routers/v3/connector.py (3)

3-3: Good addition of Request import

The Request import from FastAPI has been correctly added to support accessing request headers.


261-272: Added request parameter and passed headers correctly

The changes correctly add a request: Request parameter to the function signature and properly pass the request headers to the ModelSubstitute constructor, consistent with the implementation in the v2 router.


286-286: Correctly updated the v2 fallback to include the request parameter

The fallback to v2 connector has been properly updated to include the request parameter, maintaining consistency between the v3 and v2 implementations.

ibis-server/tests/routers/v2/connector/test_postgres.py (6)

26-26: Good update to test data

The test manifest has been correctly updated to include a "catalog" field in the tableReference, which is essential for testing the new functionality.


714-743: Comprehensive test coverage for header-based catalog and schema

The test case has been thoroughly updated to verify both scenarios:

  1. When catalog and schema are explicitly specified in the SQL query
  2. When catalog and schema are provided via HTTP headers

This comprehensive testing ensures the functionality works correctly in all expected use cases.


750-789: Thorough testing of CTE with both approaches

The test for model substitution with CTEs has been properly extended to test both scenarios:

  1. With catalog and schema in the SQL
  2. With catalog and schema in the headers

This ensures the model substitution works correctly with more complex SQL constructs.


796-833: Comprehensive subquery testing with both approaches

The test for model substitution with subqueries has been thoroughly extended to verify both scenarios:

  1. With catalog and schema in the SQL
  2. With catalog and schema in the headers

This ensures the model substitution works correctly with nested queries.


840-863: Good error handling test for out-of-scope models

The tests properly verify that appropriate error messages are returned when models are out of scope, regardless of whether catalog and schema are specified in SQL or via headers.


870-893: Thorough testing of error cases for non-existent columns

The tests verify that appropriate error messages are returned for non-existent columns, both when using catalog and schema in SQL and when using headers.

ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py (11)

15-15: Good addition of the catalog field to the model definition.

The explicit addition of "catalog": "test" to the tableReference aligns with the new testing strategy that compares SQL queries with explicit catalog/schema qualifiers against those using HTTP headers.


33-46: Tests properly verify catalog/schema substitution in SQL queries.

Good implementation of the test case to verify that explicit catalog/schema references in SQL are correctly substituted with manifest values. The test correctly asserts that "test"."public"."orders" is transformed to "my_catalog"."my_schema"."Orders".


48-65: Headers-based substitution test is implemented correctly.

This test case properly validates that providing catalog and schema via HTTP headers works correctly when the SQL query contains only the table name. The assertion confirms that the model is correctly resolved even without explicit catalog/schema in the SQL.


69-87: CTE test with explicit catalog/schema is well structured.

The test correctly verifies model substitution within a Common Table Expression when using explicit catalog and schema qualifiers in the SQL.


89-111: Headers-based CTE test is implemented correctly.

Good implementation of the test case with catalog/schema headers for CTEs. This verifies that model substitution works properly within CTEs when catalog/schema context is provided via headers.


115-132: Subquery test with explicit catalog/schema is well structured.

The test correctly verifies model substitution within a subquery when using explicit catalog and schema qualifiers in the SQL.


134-155: Headers-based subquery test is implemented correctly.

Good implementation of the test case with catalog/schema headers for subqueries. This verifies that model substitution works properly within subqueries when catalog/schema context is provided via headers.


159-170: Error case testing for out-of-scope models with explicit catalog/schema.

The test properly verifies that an appropriate error is returned when a non-existent model is referenced in the SQL.


171-185: Error case testing for out-of-scope models with headers.

The test correctly verifies that the same error behavior is exhibited when using catalog/schema headers with a non-existent model.


191-201: Error case testing for non-existent columns with explicit catalog/schema.

The test properly verifies that an appropriate error is returned when a non-existent column is referenced in the SQL with explicit catalog/schema qualifiers.


202-215: Error case testing for non-existent columns with headers.

The test correctly verifies that the same error behavior is exhibited when using catalog/schema headers with a non-existent column.

@goldmedal goldmedal self-requested a review April 16, 2025 06:31
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @douenergy. It looks nice. However, I think we need to enhance the tests. I left some comments.
By the way, I found another potential issue at _build_model_dict.
This method is only used for the full qualified table reference. If a table reference isn't qualified, the hash key would be wrong.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
ibis-server/app/dependencies.py (1)

13-14: Consider adding a docstring to the new dependency function.

While the function is simple, adding a docstring would improve maintainability by clearly indicating:

  1. The purpose of this function as a FastAPI dependency
  2. That it's specifically designed to support header extraction for model substitution
  3. The notable headers it's intended to provide (x-user-catalog and x-user-schema)
def get_wren_headers(request: Request) -> Headers:
+    """
+    Extract HTTP headers from the incoming request.
+    
+    Primarily used to access catalog and schema context headers
+    ('x-user-catalog' and 'x-user-schema') for model substitution.
+    
+    Args:
+        request: The FastAPI request object
+        
+    Returns:
+        Headers object containing all request headers
+    """
    return request.headers
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3334f9a and 1d33199.

📒 Files selected for processing (3)
  • ibis-server/app/dependencies.py (1 hunks)
  • ibis-server/app/routers/v2/connector.py (2 hunks)
  • ibis-server/app/routers/v3/connector.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • ibis-server/app/routers/v2/connector.py
  • ibis-server/app/routers/v3/connector.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci
🔇 Additional comments (1)
ibis-server/app/dependencies.py (1)

1-2: Appropriate imports added for new functionality.

The added imports for Request from fastapi and Headers from starlette.datastructures are necessary for the new header extraction functionality.

@douenergy
Copy link
Copy Markdown
Contributor Author

Thanks @goldmedal for the review. I’ve switched to using starlette headers and added a partial headers test

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
ibis-server/app/mdl/substitute.py (3)

13-13: Consider adding a more specific type hint for the headers parameter

The new headers parameter would benefit from a more specific type hint to indicate what type of object is expected (e.g., Headers, Dict[str, str], etc.).

-def __init__(self, data_source: DataSource, manifest_str: str, headers=None):
+def __init__(self, data_source: DataSource, manifest_str: str, headers: Optional[Dict[str, str]] = None):

Don't forget to add the corresponding imports:

from typing import Dict, Optional

57-68: Consider simplifying the fallback logic

The header fallback logic is correct but could be simplified to reduce repetition.

-# Determine catalog
-if source.catalog:
-    catalog = source.catalog
-else:
-    catalog = self.headers.get("x-user-catalog", "") if self.headers else ""
-
-# Determine schema
-if source.db:
-    schema = source.db
-else:
-    schema = self.headers.get("x-user-schema", "") if self.headers else ""
+# Determine catalog and schema with fallback to headers
+catalog = source.catalog or (self.headers.get("x-user-catalog", "") if self.headers else "")
+schema = source.db or (self.headers.get("x-user-schema", "") if self.headers else "")

13-17: Add a docstring to explain the headers parameter

Consider adding a docstring to explain the purpose of the new headers parameter and how it affects model substitution.

 def __init__(self, data_source: DataSource, manifest_str: str, headers=None):
+    """
+    Initialize a ModelSubstitute instance.
+    
+    Args:
+        data_source: The data source type
+        manifest_str: Base64-encoded JSON string containing manifest data
+        headers: Optional HTTP headers containing 'x-user-catalog' and 'x-user-schema'
+               for fallback catalog and schema values when not specified in SQL
+    """
     self.data_source = data_source
     self.manifest = base64_to_dict(manifest_str)
     self.model_dict = self._build_model_dict(self.manifest["models"])
     self.headers = dict(headers) if headers else None
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d574f7 and 7fcd70f.

📒 Files selected for processing (5)
  • ibis-server/app/mdl/substitute.py (2 hunks)
  • ibis-server/app/routers/v2/connector.py (8 hunks)
  • ibis-server/app/routers/v3/connector.py (8 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (6 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py
  • ibis-server/tests/routers/v2/connector/test_postgres.py
  • ibis-server/app/routers/v2/connector.py
  • ibis-server/app/routers/v3/connector.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
ibis-server/app/mdl/substitute.py (3)
ibis-server/app/model/data_source.py (1)
  • DataSource (42-67)
ibis-server/tests/routers/v3/connector/postgres/test_model_substitute.py (1)
  • manifest_str (28-29)
ibis-server/app/util.py (1)
  • base64_to_dict (18-19)
🔇 Additional comments (3)
ibis-server/app/mdl/substitute.py (3)

17-17: Implementation looks good

Good approach to convert the headers to a dictionary if provided, making it easier to access header values later.


43-52: Well-structured key generation logic

The refactored key generation logic correctly handles different table reference formats (fully qualified, partially qualified, and unqualified), addressing the past review comment about key generation.


70-70:

Details

❓ Verification inconclusive

Potential key inconsistency with model_dict

The lookup key f"{catalog}.{schema}.{table}" always uses the fully qualified format, but _build_model_dict creates keys in different formats based on the table reference. This might lead to lookup failures when the source doesn't have both catalog and schema.

Let's check for potential inconsistencies in key generation vs lookup:

Consider updating the lookup logic to match the key generation logic:

-return self.model_dict.get(f"{catalog}.{schema}.{table}", None)
+# Try fully qualified key first
+model = self.model_dict.get(f"{catalog}.{schema}.{table}", None)
+if model is not None:
+    return model
+
+# Try partially qualified key
+if schema:
+    model = self.model_dict.get(f"{schema}.{table}", None)
+    if model is not None:
+        return model
+
+# Try unqualified key
+return self.model_dict.get(table, None)

🏁 Script executed:

#!/bin/bash
# Find all instances of model_dict usage to verify key consistency
rg -A 3 "model_dict\.get" ibis-server/app/mdl/
# Check for any unit tests that might validate this behavior
rg -A 10 "ModelSubstitute" ibis-server/tests/

Length of output: 330


Attention: Potential Key Generation Mismatch in Model Lookup

The lookup in ibis-server/app/mdl/substitute.py uses a fully qualified key (f"{catalog}.{schema}.{table}"), but our investigation suggests that model keys might be generated in various formats (fully qualified, partially qualified, or unqualified). The current tests did not reveal any unit tests confirming the intended behavior of key generation, so please verify whether the key formats in _build_model_dict truly differ from the lookup format.

Recommendations:

  • Confirm the key-building logic in _build_model_dict to check for inconsistent key formats.
  • Consider adopting a multi-step lookup approach, for example:
    - return self.model_dict.get(f"{catalog}.{schema}.{table}", None)
    + model = self.model_dict.get(f"{catalog}.{schema}.{table}", None)
    + if model is not None:
    +     return model
    +
    + if schema:
    +     model = self.model_dict.get(f"{schema}.{table}", None)
    +     if model is not None:
    +         return model
    +
    + return self.model_dict.get(table, None)
  • Manually verify that this change aligns with the intended behavior and existing unit tests (if any) in the codebase.

Please double-check these changes to ensure that all key-query scenarios are covered and that any adjustments in key generation logic are reconciled.

Copy link
Copy Markdown
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great!

@goldmedal goldmedal merged commit 4d6cb94 into Canner:main Apr 17, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ibis python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants