Skip to content

fix(core): add missing variant and fix quoted issue for Oracle#1178

Merged
douenergy merged 6 commits intoCanner:mainfrom
goldmedal:fix/oracle-ds
May 6, 2025
Merged

fix(core): add missing variant and fix quoted issue for Oracle#1178
douenergy merged 6 commits intoCanner:mainfrom
goldmedal:fix/oracle-ds

Conversation

@goldmedal
Copy link
Copy Markdown
Contributor

@goldmedal goldmedal commented May 2, 2025

Summary by CodeRabbit

  • New Features
    • Added support for Oracle as a data source, including dialect-specific identifier quoting and manifest enhancements.
    • Introduced integration tests for Oracle database connections and queries, validating data retrieval, function execution, and type handling.
  • Tests
    • Added new test fixtures and modules for Oracle integration, including containerized test setup, API endpoint validation, and function list retrieval.
    • Updated test manifests to explicitly identify Oracle as the data source.
  • Refactor
    • Improved identifier quoting logic by delegating to inner dialects before applying custom rules.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented May 2, 2025

"""

Walkthrough

This set of changes introduces Oracle database support across both the Rust core and Python testing layers. The Rust codebase adds an Oracle variant to the DataSource enum, updates display and dialect logic to handle Oracle-specific identifier quoting, and enables the dialect system to recognize and apply Oracle rules. On the Python side, new test fixtures and integration tests are added for Oracle, including containerized setup, schema initialization, and API endpoint validation. Test manifests and metadata are updated to include Oracle as a data source.

Changes

Files/Paths Change Summary
ibis-server/tests/routers/v2/connector/test_oracle.py Added "dataSource": "oracle" to the test manifest dictionary and added finalizer to stop Oracle container after tests.
ibis-server/tests/routers/v3/connector/oracle/conftest.py Introduced Oracle test fixtures using testcontainers, with container startup, schema/data setup, connection info/url fixtures, and pytest marker.
ibis-server/tests/routers/v3/connector/oracle/test_query.py Added integration tests for Oracle /query endpoint, including manifest fixture, connection info/url fixtures, and validation of query results and types.
ibis-server/tests/routers/v3/connector/oracle/test_function.py Added integration tests for Oracle functions API and SQL scalar/aggregate function execution, with manifest fixture and connection info.
wren-core-base/manifest-macro/src/lib.rs Added Oracle variant to DataSource enum with serde alias "oracle".
wren-core-base/src/mdl/manifest.rs Extended Display implementation for DataSource to handle Oracle variant.
wren-core/core/src/mdl/dialect/inner_dialect.rs Added identifier_quote_style method to InnerDialect trait; implemented OracleDialect with Oracle-specific quoting logic; updated factory function to return OracleDialect.
wren-core/core/src/mdl/dialect/wren_dialect.rs Modified identifier_quote_style in WrenDialect to delegate to inner dialect before applying Wren-specific logic.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Suite
    participant Pytest as Pytest Fixtures
    participant Oracle as Oracle Container
    participant API as /query Endpoint

    Test->>Pytest: Request Oracle fixture
    Pytest->>Oracle: Start container, setup schema and data
    Pytest-->>Test: Provide connection info and connection URL
    Test->>API: POST /query with Oracle manifest and SQL query
    API->>API: Deserialize manifest, recognize Oracle data source
    API->>Oracle: Execute query via SQL engine
    Oracle-->>API: Return query results
    API-->>Test: Respond with data and type information
Loading
sequenceDiagram
    participant Core as Rust Core
    participant Manifest as DataSource Enum
    participant Dialect as InnerDialect System

    Manifest->>Core: Deserialize manifest with "oracle"
    Core->>Manifest: Recognize DataSource::Oracle variant
    Core->>Dialect: get_inner_dialect(DataSource::Oracle)
    Dialect->>Dialect: Use OracleDialect for identifier quoting
    Dialect-->>Core: Provide Oracle-specific quoting rules
Loading

Assessment against linked issues

Objective Addressed Explanation
Add ORACLE datasource variant to DataSource enum in manifest-macro (#1177)
Ensure serialization/deserialization recognizes Oracle data source (#1177)

Possibly related PRs

Suggested labels

documentation

Suggested reviewers

  • douenergy

Poem

In the warren of code, a new path unfurled,
Oracle now joins our database world!
With fixtures and dialects, and quoting just right,
Our queries to Oracle now take flight.
Rust and Python, together they sing—
"Welcome, dear Oracle, let your data spring!"
🐇✨
"""


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f20e3c1 and c6ab3aa.

📒 Files selected for processing (2)
  • ibis-server/tests/routers/v2/connector/test_oracle.py (2 hunks)
  • ibis-server/tests/routers/v3/connector/oracle/conftest.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • ibis-server/tests/routers/v2/connector/test_oracle.py
  • ibis-server/tests/routers/v3/connector/oracle/conftest.py
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: ci
  • GitHub Check: cargo test (win64)
  • GitHub Check: test
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added core ibis rust Pull requests that update Rust code python Pull requests that update Python code labels May 2, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
wren-core/core/src/mdl/dialect/inner_dialect.rs (1)

111-114: Fix typo in variable name.

There's a small typo in the variable name on line 112.

-    let uppsercase = sql.to_uppercase();
+    let uppercase = sql.to_uppercase();
ibis-server/tests/routers/v3/connector/oracle/test_query.py (1)

116-134: Consider adding full data validation in connection URL test.

While this test validates the basic functionality of querying with a connection URL, it only checks the first data element rather than the full dataset as in the previous test.

For consistency with the first test, consider adding the same comprehensive data validation:

    assert result["data"][0][0] == 1
+   assert result["data"][0] == [
+       1,
+       370,
+       "O",
+       "172799.49",
+       "1996-01-02 00:00:00.000000",
+       "1_370",
+       "2024-01-01 23:59:59.000000",
+       "2024-01-01 23:59:59.000000 UTC",
+       None,
+   ]
+   assert result["dtypes"] == {
+       "orderkey": "int64",
+       "custkey": "int64",
+       "orderstatus": "object",
+       "totalprice": "object",
+       "orderdate": "object",
+       "order_cust_key": "object",
+       "timestamp": "object",
+       "timestamptz": "object",
+       "test_null_time": "datetime64[ns]",
+   }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2380ae8 and 0106700.

⛔ Files ignored due to path filters (1)
  • ibis-server/resources/function_list/oracle.csv is excluded by !**/*.csv
📒 Files selected for processing (7)
  • ibis-server/tests/routers/v2/connector/test_oracle.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/oracle/conftest.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/oracle/test_query.py (1 hunks)
  • wren-core-base/manifest-macro/src/lib.rs (1 hunks)
  • wren-core-base/src/mdl/manifest.rs (1 hunks)
  • wren-core/core/src/mdl/dialect/inner_dialect.rs (3 hunks)
  • wren-core/core/src/mdl/dialect/wren_dialect.rs (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
wren-core-base/src/mdl/manifest.rs (1)
ibis-server/app/model/data_source.py (1)
  • DataSource (42-67)
ibis-server/tests/routers/v3/connector/oracle/conftest.py (2)
ibis-server/tests/routers/v2/connector/test_oracle.py (1)
  • oracle (120-144)
ibis-server/tests/conftest.py (1)
  • file_path (10-11)
wren-core/core/src/mdl/dialect/inner_dialect.rs (2)
wren-core/core/src/mdl/dialect/wren_dialect.rs (2)
  • identifier_quote_style (40-54)
  • new (103-107)
ibis-server/app/model/data_source.py (1)
  • DataSource (42-67)
ibis-server/tests/routers/v3/connector/oracle/test_query.py (2)
ibis-server/tests/routers/v3/connector/oracle/conftest.py (3)
  • oracle (27-51)
  • connection_info (55-64)
  • connection_url (68-70)
ibis-server/tests/conftest.py (1)
  • client (18-23)
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: ci
  • GitHub Check: test
  • GitHub Check: cargo check
🔇 Additional comments (17)
ibis-server/tests/routers/v2/connector/test_oracle.py (1)

23-23: Addition of Oracle data source specification in the manifest.

The change to explicitly specify "dataSource": "oracle" in the test manifest ensures proper dialect selection and identifier quoting logic for Oracle database tests. This aligns with the new Oracle data source support added to the system.

wren-core-base/manifest-macro/src/lib.rs (1)

105-106: Added Oracle variant to DataSource enum.

This change adds Oracle as a supported data source type with the appropriate serde alias "oracle", correctly implementing the missing Oracle data source variant mentioned in the PR objectives.

wren-core-base/src/mdl/manifest.rs (1)

109-109: Added Display implementation for Oracle data source.

The implementation of the Display trait for the Oracle variant follows the consistent pattern of uppercase formatting used for other data sources. This supports proper serialization and display of Oracle data source identifiers.

wren-core/core/src/mdl/dialect/wren_dialect.rs (1)

41-43:

Details

✅ Verification successful

Enhanced identifier quoting to delegate to inner dialect.

This change enables Oracle-specific identifier quoting by first delegating to the inner dialect's quote style logic. It addresses the PR objective to fix quotation of identifiers for Oracle that aren't entirely uppercase.


🏁 Script executed:

#!/bin/bash
# Verify that the Oracle dialect implementation exists and handles identifier quoting

# Look for the Oracle dialect implementation
echo "Searching for Oracle dialect implementation..."
rg -A 10 "OracleDialect" --type rust

# Look for the specific identifier quoting implementation for Oracle
echo "Searching for Oracle identifier quoting implementation..."
rg -A 10 "identifier_quote_style.*Oracle" --type rust

Length of output: 2305


✅ Delegation to inner dialect correctly handles Oracle quoting

  • Verified that OracleDialect in wren-core/core/src/mdl/dialect/inner_dialect.rs implements
    fn identifier_quote_style(&self, identifier: &str) -> Option<char>
    and returns Some('"') for lowercase identifiers, keywords, or non-matching patterns.
  • The change in wren-core/core/src/mdl/dialect/wren_dialect.rs (lines 41–43) now correctly defers to this inner logic, fulfilling the PR’s goal to quote non-uppercase Oracle identifiers.
wren-core/core/src/mdl/dialect/inner_dialect.rs (4)

23-24: LGTM! Required dependencies added for the Oracle dialect.

These imports add the necessary dependencies for implementing Oracle's identifier quoting logic: ALL_KEYWORDS to check if an identifier is a reserved keyword and Regex for validating identifier patterns.

Also applies to: 28-28


48-50: LGTM! Well-structured default implementation for the trait method.

Adding a default implementation for identifier_quote_style that returns None is a good approach - it maintains backward compatibility with existing dialects while allowing the Oracle dialect to override it.


58-58: LGTM! Oracle dialect is properly integrated.

The get_inner_dialect function has been correctly updated to return an OracleDialect instance for the DataSource::Oracle variant.


94-109: LGTM! Oracle dialect implementation matches Oracle SQL syntax rules.

The implementation correctly handles Oracle's identifier quoting requirements:

  1. Checks if the identifier is a SQL keyword
  2. Validates against the identifier pattern
  3. Checks if the identifier is not uppercase (Oracle defaults to uppercase)

This ensures proper SQL generation when working with Oracle databases.

ibis-server/tests/routers/v3/connector/oracle/conftest.py (5)

1-17: LGTM! Well-structured configuration for Oracle tests.

The imports, constants, and base URL are properly set up. Using a dedicated marker for Oracle tests will help with selective test execution.


19-24: LGTM! Efficient test marking strategy.

Using a collection hook to automatically mark all tests in this directory is an elegant solution that avoids manually marking each test function. The use of pathlib for path comparison is also a good practice.


26-51: LGTM! Comprehensive Oracle test fixture.

This fixture:

  1. Sets up an Oracle container with the correct image and credentials
  2. Loads test data from parquet files
  3. Creates a test table with CLOB data (2MB) to test large object handling
  4. Adds table and column comments for metadata testing

All steps are well organized and properly scoped at the module level for efficiency.


54-64: LGTM! Well-documented connection info extraction.

The fixture correctly extracts connection parameters from the container and includes a helpful comment explaining why container attributes can't be directly accessed.


67-70: LGTM! Proper connection URL formatting.

The connection URL is correctly formatted according to Oracle connection string standards.

ibis-server/tests/routers/v3/connector/oracle/test_query.py (4)

1-8: LGTM! Required imports for Oracle tests.

All necessary imports are included for base64 encoding, JSON handling, and the fallback disable header.


9-67: LGTM! Comprehensive test manifest with diverse column types.

The manifest includes:

  1. Various SQL data types (number, varchar, date, timestamp)
  2. Hidden columns
  3. Expression-based columns
  4. Timestamp with timezone
  5. Null values
  6. String concatenation
  7. Proper Oracle schema and table references

This coverage ensures robust testing of Oracle's type system and SQL capabilities.


70-73: LGTM! Standard fixture for manifest encoding.

Correctly encodes the manifest as base64 for API testing.


75-114: LGTM! Thorough test for Oracle query with explicit connection info.

The test:

  1. Sends a properly formatted request to the query endpoint
  2. Includes the fallback disable header
  3. Verifies HTTP 200 response
  4. Checks column count (excluding hidden column)
  5. Validates exact data values including timestamps and nulls
  6. Verifies all returned data types

This provides comprehensive validation of the Oracle connector's query functionality.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
ibis-server/tests/routers/v3/connector/oracle/test_function.py (1)

1-106: 🛠️ Refactor suggestion

Add negative test cases for Oracle-specific error scenarios.

The current tests only cover successful scenarios. Consider adding negative test cases for Oracle-specific error scenarios, such as invalid function usage or Oracle-specific syntax errors.

Add a new test function:

async def test_oracle_specific_error(client, manifest_str: str, connection_info):
    """Test Oracle-specific error handling."""
    response = await client.post(
        url=f"{base_url}/query",
        json={
            "connectionInfo": connection_info,
            "manifestStr": manifest_str,
            # Oracle error: Using || for string concatenation with NULL produces NULL
            "sql": "SELECT NULL || 'test' AS col",
        },
        headers={
            X_WREN_FALLBACK_DISABLE: "true",
        },
    )
    assert response.status_code == 200
    result = response.json()
    # In Oracle, NULL || string = NULL
    assert result["data"][0][0] is None
🧹 Nitpick comments (5)
ibis-server/tests/routers/v3/connector/oracle/test_function.py (5)

11-27: Consider enhancing the test manifest with additional columns and Oracle-specific data types.

The test manifest has a minimal structure with just one table and one column. While sufficient for basic testing, adding more columns with different Oracle-specific data types would provide more comprehensive test coverage.

 manifest = {
     "dataSource": "oracle",
     "catalog": "my_catalog",
     "schema": "my_schema",
     "models": [
         {
             "name": "orders",
             "tableReference": {
                 "schema": "SYSTEM",
                 "table": "ORDERS",
             },
             "columns": [
                 {"name": "orderkey", "expression": '"O_ORDERKEY"', "type": "number"},
+                {"name": "order_date", "expression": '"O_ORDERDATE"', "type": "timestamp"},
+                {"name": "comment", "expression": '"O_COMMENT"', "type": "varchar"},
+                {"name": "total", "expression": '"O_TOTAL"', "type": "number"},
             ],
         },
     ],
 }

48-49: Make the function count check more maintainable.

Instead of hard-coding "+1", use a variable to track the expected number of additional functions. This makes the test more maintainable when adding more custom functions later.

 response = await client.get(url=f"{base_url}/functions")
 assert response.status_code == 200
 result = response.json()
-assert len(result) == DATAFUSION_FUNCTION_COUNT + 1
+oracle_specific_functions = ["to_timestamp_tz"]
+assert len(result) == base_function_count + len(oracle_specific_functions)
 the_func = next(filter(lambda x: x["name"] == "to_timestamp_tz", result))

49-57: Add error handling for the function filter.

The next(filter(...)) call will raise a StopIteration exception if the function is not found. Consider adding error handling or a more explicit assertion to provide a better error message.

-the_func = next(filter(lambda x: x["name"] == "to_timestamp_tz", result))
+oracle_functions = list(filter(lambda x: x["name"] == "to_timestamp_tz", result))
+assert oracle_functions, "Oracle-specific function 'to_timestamp_tz' not found in results"
+the_func = oracle_functions[0]
 assert the_func == {
     "name": "to_timestamp_tz",
     "description": "Convert string to timestamp with time zone",
     "function_type": "scalar",
     "param_names": None,
     "param_types": None,
     "return_type": None,
 }

62-63: Use the previously stored base_function_count variable for consistency.

For maintainability, use the same variable established earlier to verify the count returns to the original value.

 response = await client.get(url=f"{base_url}/functions")
 assert response.status_code == 200
 result = response.json()
-assert len(result) == DATAFUSION_FUNCTION_COUNT
+assert len(result) == base_function_count, "Function count should return to original value"

93-93: Test Oracle's syntax for common table expressions.

The subquery syntax used is standard SQL, but Oracle has specific syntax for CTEs and temporary tables. Consider testing Oracle's syntax to ensure dialect-specific queries work.

-            "sql": "SELECT COUNT(*) AS col FROM (SELECT 1) AS temp_table",
+            "sql": "WITH temp_table AS (SELECT 1 FROM DUAL) SELECT COUNT(*) AS col FROM temp_table",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0106700 and f20e3c1.

⛔ Files ignored due to path filters (1)
  • ibis-server/resources/function_list/oracle.csv is excluded by !**/*.csv
📒 Files selected for processing (2)
  • ibis-server/tests/routers/v3/connector/oracle/conftest.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/oracle/test_function.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ibis-server/tests/routers/v3/connector/oracle/conftest.py
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: cargo test (win64)
  • GitHub Check: cargo test (macos)
  • GitHub Check: cargo test (macos-aarch64)
  • GitHub Check: test
  • GitHub Check: ci
🔇 Additional comments (1)
ibis-server/tests/routers/v3/connector/oracle/test_function.py (1)

23-23:

Details

❓ Verification inconclusive

Verify Oracle identifier quoting convention.

The expression uses double quotes within a string ('"O_ORDERKEY"'). This seems intentional to handle Oracle's identifier quoting, but ensure this matches Oracle's syntax rules. Oracle typically requires double quotes for case-sensitive identifiers or those with special characters.


🏁 Script executed:

#!/bin/bash
# Check for similar identifier quoting patterns in the codebase
rg -A 3 "dataSource\": \"oracle\"" --glob "*.{py,json}" | grep -E "expression.*\""

Length of output: 80


Verify Oracle identifier quoting

The test uses embedded double quotes in the expression field ("O_ORDERKEY") to quote the identifier. Oracle requires double quotes around identifiers to preserve case or include special characters. Please confirm that this matches your connector’s quoting logic and Oracle’s syntax.

• File: ibis-server/tests/routers/v3/connector/oracle/test_function.py:23

@goldmedal goldmedal requested a review from douenergy May 5, 2025 01:45
@douenergy
Copy link
Copy Markdown
Contributor

Thanks @goldmedal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core ibis python Pull requests that update Python code rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ORACLE datasource variant missing in wren core manifest-macro

2 participants