fix(ibis): refactor the available function list#1112
Conversation
WalkthroughThis pull request updates several test files in the ibis-server by modifying function test assertions, expected values, and removing CSV‐based test functions. The constant Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant PySessionContext
participant Runtime
participant Database
Client->>PySessionContext: new(mdl_base64, remote_functions_path)
PySessionContext->>Runtime: Initialize asynchronous runtime
PySessionContext->>Runtime: Register remote functions asynchronously
Client->>PySessionContext: transform_sql(sql)
PySessionContext->>Runtime: Execute SQL transformation asynchronously
Runtime->>Database: Query registered functions (if needed)
Database-->>Runtime: Return query results
Runtime-->>PySessionContext: Return transformed SQL/result
PySessionContext-->>Client: Return transformed SQL/result
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (4)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
1a52a99 to
6c6b94d
Compare
|
We still have the BigQuery secret key issue for the CI. I tested BigQuery on my laptop, and it worked well. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (16)
wren-core/core/src/lib.rs (1)
4-4:Details
❓ Verification inconclusive
Expanded public API with broader datafusion exports
You've changed from selectively exporting
SessionContextto re-exporting all items fromdatafusion::arrow::*anddatafusion::prelude::*. While this makes more functionality available, it could lead to:
- Potential naming conflicts with other modules
- Tighter coupling to datafusion's API (changes in datafusion could directly impact users)
- Less clear API boundaries
Consider a more selective approach or documenting the expanded public API:
- pub use datafusion::arrow::*; + // Export specific types from datafusion::arrow that are needed + pub use datafusion::arrow::{array, compute, datatypes, record_batch}; - pub use datafusion::prelude::*; + // Export commonly used types from datafusion::prelude + pub use datafusion::prelude::{SessionContext, col, lit, when, SessionConfig};Let's check if any of the exported symbols might conflict with other common names:
Also applies to: 7-7
🏁 Script executed:
#!/bin/bash # Check for potential naming conflicts in exported symbols echo "Checking for common names that might conflict..." # Get symbols from datafusion::prelude cargo doc --no-deps --open &> /dev/null echo "Common names from datafusion::prelude to watch for conflicts:" cargo rustdoc -- --document-private-items 2>/dev/null find target/doc -name "*.html" | grep -i "datafusion.*prelude" | xargs grep -l "pub fn \|pub struct \|pub enum" | xargs grep -o "pub [a-z]* [A-Za-z0-9_]*" | grep -E "pub (fn|struct|enum) (error|result|option|string|vec|map|set|path|file|io|read|write|context|config|session)"Length of output: 636
Attention: Verify Exported Symbols and API Boundaries
The recent change re-exports all symbols from both
datafusion::arrowanddatafusion::prelude, which may expose the API to potential naming conflicts, tighter coupling with DataFusion’s internals, and a less clear public API surface. We attempted an automated check for common naming conflicts using documentation generated from DataFusion's prelude, but the script did not yield any output (thetarget/docdirectory was not found). This suggests that automated validation was inconclusive, so please verify manually if any of the re-exported symbols conflict with common names used elsewhere in the codebase.Recommendations:
- Review Naming Conflicts: Manually inspect the exported symbols to ensure they don’t conflict with common names (e.g.,
context,config,session).- Consider a Selective Export: To minimize risks, export only those components that are actually required. For example:
- pub use datafusion::arrow::*; + // Export specific items needed from datafusion::arrow: + pub use datafusion::arrow::{array, compute, datatypes, record_batch}; - pub use datafusion::prelude::*; + // Export selected functionality from datafusion::prelude: + pub use datafusion::prelude::{SessionContext, col, lit, when, SessionConfig};- Documentation: Clearly document the expanded public API and note the rationale behind the selective exports.
Please manually verify the exported symbols for potential conflicts and evaluate whether a more selective export approach would reduce coupling with DataFusion’s API.
ibis-server/tools/remote_function_check.py (3)
1-43: New utility script for duplicate function detection looks good, with minor improvement opportunitiesThis script efficiently identifies and removes duplicate functions from CSV files by comparing against functions available in wren_core. It supports the PR's goal of eliminating duplicate remote functions.
Consider these improvements:
- Add error handling for file operations (file not found, permission errors)
- Replace print statements with proper logging
- Make the CSV structure parsing more robust (currently assumes function name is in second column)
-import argparse -import os -import wren_core +import argparse +import logging +import os +import wren_core +from pathlib import Path + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) # Later in code: -print("Default Function count: ", len(function_names)) -print("Function is already in the function list:") +logger.info(f"Default Function count: {len(function_names)}") +logger.info("Function is already in the function list:") # Add error handling: +try: with open(args.path, "r") as file: # ... +except FileNotFoundError: + logger.error(f"File not found: {args.path}") + exit(1) +except PermissionError: + logger.error(f"Permission denied: {args.path}") + exit(1)🧰 Tools
🪛 Ruff (0.8.2)
1-1: File
ibis-server/tools/remote_function_check.pyis part of an implicit namespace package. Add an__init__.py.(INP001)
23-23:
(T201)
24-24:
(T201)
27-27: Unnecessary open mode parameters
Remove open mode parameters
(UP015)
42-42:
(T201)
27-27: Remove unnecessary file mode parameterThe default mode for
open()is already 'r' (read), so specifying it is redundant.-with open(args.path, "r") as file: +with open(args.path) as file:🧰 Tools
🪛 Ruff (0.8.2)
27-27: Unnecessary open mode parameters
Remove open mode parameters
(UP015)
1-6: Add__init__.pyto avoid implicit namespace package warningStatic analysis detected that this file is part of an implicit namespace package.
Create an empty
__init__.pyfile in thetoolsdirectory to resolve this warning.🧰 Tools
🪛 Ruff (0.8.2)
1-1: File
ibis-server/tools/remote_function_check.pyis part of an implicit namespace package. Add an__init__.py.(INP001)
ibis-server/tests/routers/v3/connector/snowflake/test_functions.py (1)
56-65: Database-specific function testing looks good, but consider checking result lengthThe test now properly checks for Snowflake-specific function "is_null_value" instead of generic "abs", which aligns with the PR goal to improve DataSource-specific function handling.
Consider re-enabling the count assertion to verify the total number of functions:
- # assert len(result) == DATAFUSION_FUNCTION_COUNT + 60 + assert len(result) == DATAFUSION_FUNCTION_COUNT + 60 # Adjust number if neededwren-core-py/src/remote_functions.rs (4)
66-75: Semantic clarity on variable naming.
Within.and_then(|types| ...), you're processing parameter names but calling the variabletypes. For clarity, consider updating the variable name. Otherwise, the logic to flatten and filter out empty strings is sound..map(|names| { names .into_iter() .flatten() .collect::<Vec<String>>() .join(",") }) -.and_then(|types| if types.is_empty() { None } else { Some(types) }); +.and_then(|joined_names| if joined_names.is_empty() { None } else { Some(joined_names) });
86-90: Empty string handling for return_type.
Converting the empty string intoNoneis a neat solution. If whitespaces are possible in the CSV, you may want to trim them first, but that’s optional.
109-116: Potential whitespace trimming.
When buildingVec<Option<String>>for param names, consider trimming each piece to avoid storing accidental whitespace..map(|name| { let trimmed = name.trim(); if trimmed.is_empty() { None } else { Some(trimmed.to_string()) } })
121-128: Param type parsing consistency.
This is the same approach used for param names. Trimming may also be beneficial here for cleanliness.wren-core-py/src/context.rs (2)
62-62: Unwrap risk in default.
UsingRuntime::new().unwrap()can panic on failure. Consider.expect(...)or error handling if the runtime fails to initialize.
238-305: Async function “get_regietered_functions” with potential typo.
The method’s name is misspelled (“regietered”). Consider renaming toget_registered_functionsfor clarity. Otherwise, the batch parsing logic is solid.-async fn get_regietered_functions( +async fn get_registered_functions(wren-core/core/src/mdl/function.rs (5)
29-55: Handling fallback signatures on unrecognized parameter types.The new
get_signaturemethod andtransform_param_typefunction effectively parse and apply exact signatures whenparam_typesis fully known. However, note that if any parameter type parsing fails, the entire signature falls back to[Nullary, VariadicAny]. Consider logging or otherwise flagging partial mismatches, so users can diagnose unexpected fallback signatures.
89-151: Introducing theReturnTypeenum and related logic.This addition elegantly handles specialized return types, skipping complex nested array logic for now. For future extensibility, consider whether deeper nesting or advanced type resolution might be needed (e.g., arrays of arrays). Otherwise, this approach is concise and covers common cases.
178-190: Conversion with potential panic on invalid return type.
impl From<RemoteFunction> for ByPassScalarUDFpanics ifReturnType::from_strfails, aligning with the chosen "panic on invalid input" strategy. If more graceful handling is desired, consider returning aResult<Self>instead to propagate errors.
261-269: Consistent panic on invalid return type for Aggregate UDF.As with the scalar variant, the logic is clear but might be improved by propagating errors if dynamic input is expected.
466-640: Extensive coverage of remote-to-bypass function testing.These tests thoroughly validate various parameter name/type scenarios and return-type enum variants. To further enhance coverage, consider adding negative tests that confirm how errors/panics are handled upon invalid input (e.g., unrecognized data types).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (11)
ibis-server/resources/function_list/bigquery.csvis excluded by!**/*.csvibis-server/resources/function_list/canner.csvis excluded by!**/*.csvibis-server/resources/function_list/clickhouse.csvis excluded by!**/*.csvibis-server/resources/function_list/mssql.csvis excluded by!**/*.csvibis-server/resources/function_list/mysql.csvis excluded by!**/*.csvibis-server/resources/function_list/postgres.csvis excluded by!**/*.csvibis-server/resources/function_list/snowflake.csvis excluded by!**/*.csvibis-server/resources/function_list/trino.csvis excluded by!**/*.csvwren-core-py/Cargo.lockis excluded by!**/*.lockwren-core-py/poetry.lockis excluded by!**/*.lockwren-core-py/tests/functions.csvis excluded by!**/*.csv
📒 Files selected for processing (20)
ibis-server/tests/conftest.py(1 hunks)ibis-server/tests/routers/v3/connector/bigquery/test_functions.py(1 hunks)ibis-server/tests/routers/v3/connector/canner/test_functions.py(1 hunks)ibis-server/tests/routers/v3/connector/clickhouse/test_functions.py(1 hunks)ibis-server/tests/routers/v3/connector/mssql/test_functions.py(1 hunks)ibis-server/tests/routers/v3/connector/mysql/test_functions.py(1 hunks)ibis-server/tests/routers/v3/connector/postgres/test_functions.py(1 hunks)ibis-server/tests/routers/v3/connector/snowflake/test_functions.py(1 hunks)ibis-server/tests/routers/v3/connector/trino/test_functions.py(1 hunks)ibis-server/tools/remote_function_check.py(1 hunks)wren-core-py/src/context.rs(7 hunks)wren-core-py/src/errors.rs(1 hunks)wren-core-py/src/remote_functions.rs(3 hunks)wren-core-py/tests/test_modeling_core.py(2 hunks)wren-core/core/src/lib.rs(1 hunks)wren-core/core/src/logical_plan/analyze/plan.rs(5 hunks)wren-core/core/src/logical_plan/utils.rs(9 hunks)wren-core/core/src/mdl/function.rs(7 hunks)wren-core/core/src/mdl/mod.rs(4 hunks)wren-core/core/src/mdl/utils.rs(3 hunks)
🧰 Additional context used
🧬 Code Definitions (6)
ibis-server/tests/routers/v3/connector/snowflake/test_functions.py (9)
ibis-server/tests/routers/v3/connector/canner/test_functions.py (1)
set_remote_function_list_path(36-40)ibis-server/tests/routers/v3/connector/mssql/test_functions.py (1)
set_remote_function_list_path(36-40)ibis-server/tests/routers/v3/connector/mysql/test_functions.py (1)
set_remote_function_list_path(37-41)ibis-server/tests/routers/v3/connector/postgres/test_functions.py (1)
set_remote_function_list_path(38-42)ibis-server/tests/routers/v3/connector/clickhouse/test_functions.py (1)
set_remote_function_list_path(36-40)ibis-server/tests/routers/v3/connector/trino/test_functions.py (1)
set_remote_function_list_path(36-40)ibis-server/tests/routers/v3/connector/bigquery/conftest.py (1)
set_remote_function_list_path(33-37)ibis-server/app/config.py (1)
set_remote_function_list_path(70-71)ibis-server/tests/conftest.py (1)
client(18-23)
wren-core/core/src/logical_plan/analyze/plan.rs (1)
wren-core/core/src/logical_plan/utils.rs (1)
try_map_data_type(97-102)
wren-core-py/src/context.rs (3)
wren-core/core/src/mdl/function.rs (14)
name(216-218)name(279-281)name(342-344)new(165-175)new(248-258)new(311-321)from(179-188)from(262-271)from(325-334)param_types(46-49)return_type(224-226)return_type(287-289)from_str(79-86)from_str(119-129)wren-core/core/src/mdl/mod.rs (3)
new(117-172)new(421-423)register_remote_function(383-408)wren-core-py/src/remote_functions.rs (2)
from(65-99)from(103-139)
wren-core-py/tests/test_modeling_core.py (2)
wren-core-py/src/context.rs (3)
transform_sql(139-148)functions(93-96)get_available_functions(151-160)ibis-server/app/routers/v3/connector.py (1)
functions(104-115)
wren-core/core/src/mdl/mod.rs (5)
wren-core/core/src/logical_plan/utils.rs (1)
try_map_data_type(97-102)wren-core/core/src/mdl/context.rs (1)
new(228-231)wren-core/core/src/mdl/function.rs (3)
new(165-175)new(248-258)new(311-321)wren-core/core/src/logical_plan/analyze/model_anlayze.rs (2)
new(67-75)new(788-796)wren-core/core/src/mdl/lineage.rs (1)
new(26-37)
wren-core/core/src/mdl/function.rs (1)
wren-core/core/src/logical_plan/utils.rs (1)
map_data_type(104-171)
🪛 Ruff (0.8.2)
ibis-server/tools/remote_function_check.py
1-1: File ibis-server/tools/remote_function_check.py is part of an implicit namespace package. Add an __init__.py.
(INP001)
23-23: print found
(T201)
24-24: print found
(T201)
27-27: Unnecessary open mode parameters
Remove open mode parameters
(UP015)
42-42: print found
(T201)
🔇 Additional comments (67)
ibis-server/tests/conftest.py (1)
14-14:Details
❓ Verification inconclusive
Significant increase in DATAFUSION_FUNCTION_COUNT constant
The constant has been increased from 273 to 25941, which is a substantial change. This appears to align with retrieving function information from INFORMATION_SCHEMA as mentioned in the PR description, rather than manually defining functions.
Please verify that this dramatic increase is expected and won't negatively impact test performance:
🏁 Script executed:
#!/bin/bash # Check if test performance is affected by the large function count time=$(python -c " import time start = time.time() import sys sys.path.append('ibis-server') from tests.conftest import DATAFUSION_FUNCTION_COUNT print(f'Loading {DATAFUSION_FUNCTION_COUNT} functions took {time.time() - start:.2f} seconds') ") echo "$time"Length of output: 521
Review Verification: Manual Check Required for Dependency Issue
The test performance verification did not complete as expected because the environment is missing the
asgi_lifespanmodule (error: ModuleNotFoundError: No module named 'asgi_lifespan'). The dramatic increase inDATAFUSION_FUNCTION_COUNTfrom 273 to 25941 appears intentional—aligning with the new approach of retrieving function data from INFORMATION_SCHEMA—but its impact on test performance remains unverified.
- Action Items:
- Ensure that the required
asgi_lifespanmodule is installed or properly configured for the test environment.- Manually verify that loading 25,941 functions does not degrade test performance when the dependencies are correctly set up.
wren-core-py/src/errors.rs (1)
4-4: Reintroducing ParseIntError importThe import for
ParseIntErrorhas been added back, which is correct since the code already has an implementation forFrom<ParseIntError>at lines 64-68. This ensures the error handling for integer parsing works properly.ibis-server/tests/routers/v3/connector/canner/test_functions.py (1)
56-65:Details
✅ Verification successful
Updated function testing and reduced count
The test now:
- Expects 28 additional functions (down from 30)
- Tests the "to_base64" function instead of "abs"
- Has updated the expected function description and return type
These changes align with the PR's goal of eliminating duplicate remote functions for each DataSource.
Let's verify if other test files have been consistently updated with the new function pattern:
🏁 Script executed:
#!/bin/bash # Check if all connector test files have been updated to test "to_base64" instead of "abs" echo "Checking for consistent function test patterns across connector tests..." # Look for test files that might still be using "abs" function grep -r "x\[\"name\"\] == \"abs\"" --include="test_*.py" ibis-server/tests/routers/v3/connector/ # Check other connectors for consistent handling of to_base64 echo -e "\nChecking to_base64 implementation across connector test files:" grep -r "x\[\"name\"\] == \"to_base64\"" --include="test_*.py" ibis-server/tests/routers/v3/connector/Length of output: 762
Consistent test updates verified across connectors
The updated test in
ibis-server/tests/routers/v3/connector/canner/test_functions.pynow correctly expects 28 additional functions, tests the "to_base64" function (with the updated description and return type), and is consistent with similar changes in other connectors (e.g., in the Trino tests). Shell script results confirmed that there are no remaining tests still using the "abs" function and that the new "to_base64" pattern is applied uniformly.ibis-server/tests/routers/v3/connector/mssql/test_functions.py (1)
57-65: Function test now uses database-specific 'sysdatetime' instead of generic 'floor'The test now validates a MSSQL-specific function with appropriate type information, aligning with the PR's goal to improve function information retrieval and type signatures for each DataSource.
ibis-server/tests/routers/v3/connector/mysql/test_functions.py (1)
58-66: Database-specific function test updated correctlyThe test now validates MySQL-specific function "lcase" with appropriate type information (Utf8 instead of int), which aligns with the PR goal of improving function registration and type signatures for each DataSource.
wren-core/core/src/mdl/mod.rs (4)
1-1: Import of try_map_data_type is consistent with the implementation changes.The import of
try_map_data_typefromlogical_plan::utilsaligns with the function's usage throughout the file. This change is part of the refactoring effort to improve error handling for data type mapping.
222-222: Enhanced error handling with try_map_data_type.Replacing
map_data_typewithtry_map_data_typeimproves robustness by providing a fallback to Utf8 type when parsing fails, as shown in the implementation from the relevant code snippets.
391-392: Consistent application of try_map_data_type across all function types.The change consistently applies
try_map_data_typefor all three function types (Scalar, Aggregate, Window), ensuring uniform error handling behavior across different function registrations.Also applies to: 397-398, 403-404
1310-1311: Improved error message specificity for struct field access.The error message has been updated to provide more detailed information about type mismatches when accessing struct fields. The new message explicitly mentions that the operation is only valid for
Struct,Map, orNulltypes, helping developers identify the issue more quickly.ibis-server/tests/routers/v3/connector/trino/test_functions.py (1)
57-65: Test updated to verify to_base64 function instead of array_distinct.The test has been updated to verify the "to_base64" function instead of "array_distinct", with updated type signatures that match the correct specification. This aligns with the PR objective to improve type signatures and ensure consistency in function definitions.
The parameter type changed from "array" to "Binary" and return type from "array" to "Utf8", which is the correct signature for a base64 encoding function.
ibis-server/tests/routers/v3/connector/postgres/test_functions.py (1)
65-67: Updated extract function type signatures to use precise Arrow types.The parameter and return types for the "extract" function have been updated to use more precise Arrow type definitions:
- Parameter types changed from "text,timestamp" to "Utf8,Timestamp(Nanosecond, None)"
- Return type changed from "numeric" to "Decimal128(38, 10)"
These changes reflect a move towards more accurate type representations that match the Arrow data model, which is essential for proper type handling across systems.
ibis-server/tests/routers/v3/connector/clickhouse/test_functions.py (1)
57-65: Test updated to verify uniq aggregate function instead of abs scalar function.The test case has been modified to check for the "uniq" aggregate function instead of the "abs" scalar function, with appropriate updates to the function properties:
- Function type changed from "scalar" to "aggregate"
- Description updated to correctly describe the HyperLogLog-based cardinality estimation
- Parameter types and return types set to None, which is appropriate for this function in ClickHouse
This change better represents the ClickHouse-specific functions in the test suite and aligns with the PR objective to improve function definitions across different data sources.
ibis-server/tests/routers/v3/connector/bigquery/test_functions.py (2)
48-55: Enhanced test precision for string_agg functionThe test now properly filters for the specific
string_aggfunction withLargeUtf8,LargeUtf8parameter types, which makes the test more precise and less prone to errors if multiple variants of this function exist.
57-63: Updated string_agg function attributes to match actual implementationThe function metadata has been updated with a more descriptive explanation, correct parameter names, and accurate type information that aligns with the actual BigQuery implementation.
wren-core/core/src/logical_plan/utils.rs (5)
95-102: Improved error handling with fallback for unsupported data typesThe new
try_map_data_typefunction gracefully handles type mapping failures by providing a fallback toDataType::Utf8when a data type can't be parsed. This prevents errors from propagating and makes the system more robust when dealing with unknown or unsupported types.
106-107: Better variable naming for improved clarityIntroducing
lower_data_typefor the lowercase version of the data type string improves code readability by making it clear which value is being used in subsequent operations.
147-148: Added support for XML and JSONB data typesSupport for additional data types enhances compatibility with various database systems. The appropriate fallback types are chosen for data types not natively supported by Arrow.
165-168: Fixed case sensitivity in the DataType parsing fallbackThe code now correctly handles case sensitivity by using the original data type string with
DataType::from_str()rather than the lowercase version, which may lead to parsing errors for case-sensitive type names.
437-457: Updated test cases to use try_map_data_typeTest cases have been appropriately updated to use the new error-handling function while maintaining the same assertions, ensuring that the existing functionality continues to work correctly.
wren-core/core/src/mdl/utils.rs (2)
215-216: Consistent adoption of error-handled type mappingReplacing
map_data_typewithtry_map_data_typein the field creation function improves error handling for column type mapping, making the system more resilient when processing columns with unsupported data types.
233-240: Enhanced robustness in remote field handlingThe remote field creation now uses the error-handled type mapping function, reducing the risk of failures when processing data types from remote data sources that might have type systems incompatible with Arrow.
wren-core/core/src/logical_plan/analyze/plan.rs (6)
23-23: Updated import to use the error-handling version of map_data_typeThe import has been updated to use
try_map_data_typeinstead ofmap_data_type, aligning with the refactoring effort to improve error handling throughout the codebase.
230-233: Improved error handling in model plan node buildingUsing
try_map_data_typein the field creation process for the model plan node ensures that the system can handle unexpected data types gracefully without failing the entire plan construction.
767-770: Enhanced robustness in model source node field creationThe model source node now uses the error-handled type mapping function when creating fields, ensuring that wildcard selections can handle columns with unsupported data types.
807-810: Consistent error handling across field creationUsing
try_map_data_typeconsistently in all model source node field creation paths ensures uniform behavior when handling data types in both wildcard and explicit column scenarios.
901-903: Improved calculation plan node field creationThe calculation plan node now uses error-handled type mapping for the calculation column's field creation, preventing failures when dealing with complex calculated expressions that might result in unsupported data types.
906-908: Consistent error handling for primary key fieldsUsing
try_map_data_typefor primary key field creation in calculation plan nodes ensures that the join operations will work correctly even when primary keys have unusual or complex data types.wren-core-py/src/remote_functions.rs (5)
22-22: Non-issue import addition.
Addingstd::fmt::Displayis appropriate for implementing theDisplaytrait below.
27-27: Extended trait derivation.
DerivingEq,PartialEq,Hash, andDebugonPyRemoteFunctionis useful for hashing, set membership checks, and debugging.
77-85: Consistent parameter type flattening.
This mirrors the parameter names logic and is equally valid. No concerns.
94-94: Properly using the transformed return_type.
Aligns with the new handling for empty versus non-empty strings.
141-155: UsefulDisplayimplementation.
This user-friendly string representation will help with logging and debugging. No issues.wren-core-py/tests/test_modeling_core.py (10)
97-101: Updated function call for two parameters.
Using bothc_custkeyarguments confirms the function signature. The rewritten SQL assertion is correct.
109-109: Large function list size.
Raising the count to25948might affect performance. Verify this aligns with new function entries and ensure no unintentional duplicates.
112-116: Consistent transformation verification.
Ensures the generated SQL handles both parameters correctly. LGTM.
121-121: Additional function count check.
25941closely matches the previous large count—just confirm correctness so no duplication or missing entries exist.
131-131: Specifying return type as Int32.
Reflects refined data type handling. The test matches recent changes.
133-133: Correct parameter types.
“Int32,Int32” is consistent with the new approach.
141-148: Added scalar function “add_custom”.
Parameters are listed in CSV butparam_namesis intentionallyNone. This is fine if it’s expected.
149-156: Validating array-based function.
A function returningNonefor both param and return types suggests it’s untyped or determined at runtime. No immediate concerns.
157-164: Testing function with no param names and no return type.
This aligns with the possibility of dynamic type resolution in the system.
165-175: Function missing param type in CSV.
The test clarifies how a missing param type yieldsNone. Implementation is logically consistent with your data flow.wren-core-py/src/context.rs (12)
25-25: Added import for FromStr.
Enables parsing of string-based function types. Straightforward addition.
27-28: New imports for vec and Runtime.
These support your asynchronous patterns. No concerns.
31-31: Importing GenericStringType.
Likely needed for reading string columns from DataFusion. Looks good.
38-41: Expanded imports from wren_core.
Brings inmdl,AggregateUDF, etc. to manage user-defined functions. Safe changes.
48-48: Storing the runtime in the session context.
This design allows async operations. Straightforward.
86-87: Information Schema enabled.
Allows queries againstinformation_schemafor function metadata, which matches your function retrieval logic.
88-88: Runtime creation with error handling.
map_err(CoreError::from)?is good for safer initialization.
90-99: Asynchronous retrieval of registered functions.
Good approach to check for duplicates before re-registering. Minimizes overhead.
115-115: Consistent runtime usage.
Keeps the runtime in sync when an MDL is provided. No issues.
140-146: Async transform flow.
block_on(mdl::transform_sql_with_ctx(...))is consistent with the new runtime-based context.
152-159: Async function list retrieval.
Modeling functions asPyRemoteFunctionafter awaiting the DataFusion queries is orderly.
203-216: Registering remote functions by function type.
Good approach to unify the code path for scalar, aggregate, and window UDFs.wren-core/core/src/mdl/function.rs (13)
3-3: Consolidate and standardize new imports.These import statements for
internal_err,not_impl_err, and items fromdatafusion::logical_expras well as themap_data_typeutility appear consistent and purposeful. They unify error handling and provide necessary functions for the new return-type logic.Also applies to: 8-10, 17-18
24-25: Param fields allow for partial undefined parameters.Expanding
param_namesandparam_typestoOption<Vec<Option<String>>>provides flexibility for scenarios where certain parameter names or types may remain unspecified. This design choice is coherent, but ensure that downstream logic (like doc generation and signature construction) gracefully handles these nestedNonevalues.
57-57: AddedPartialEqto derive macros.Adding
PartialEqfor the enum or struct is beneficial for test assertions and comparisons. No concerns here.
66-66: FunctionType display implementation unchanged.The
impl Display for FunctionTyperemains straightforward, mirroring the enum variants. No issues noted.
80-80: String matching infrom_straligns with the existing pattern.The
match s.to_lowercase().as_str()approach is consistent across similar code. Approved.
159-159: EmbeddingReturnTypeanddoc:Storing
ReturnTypeanddocdirectly inByPassScalarUDFis consistent with the broader refactor design. The approach is clear and maintainable.Also applies to: 161-161
191-209: Building documentation from partially known parameters.Using
param_typesstrings as argument descriptions and defaulting to an empty string for missing elements ensures consistent doc generation. This looks good, and the logic handlesNonegracefully.
224-225: ByPassScalarUDF return type forwarding.The
return_typemethod simply defers toself.return_type.to_data_type(arg_types), which is coherent with the newReturnTypeabstraction.
232-234: Documentation forwarding.Returning
self.doc.as_ref()is straightforward. No concerns here.
242-242: EmbeddingReturnTypeanddocin ByPassAggregateUDF.Adopting the same design pattern for aggregates ensures consistency across UDF implementations.
Also applies to: 244-244
287-288: Aggregate UDF return type logic is straightforward.The code properly delegates to
ReturnType::to_data_type(arg_types).
295-297: Aggregate UDF documentation retrieval.Again, a simple reference to
self.doc, consistent with previous patterns.
357-360: Window function field logic.The
fieldmethod usesto_data_type(field_args.input_types())to derive the return type, which is aligned with the rest of the code.
Description
ibis-server/tools/remote_function_check.pySummary by CodeRabbit
New Features
Improvements