Skip to content

Document catalog-filtered functions endpoint, add RFC design document, and improve e2e tests#22

Closed
Copilot wants to merge 1 commit intomasterfrom
copilot/add-catalog-function-registry
Closed

Document catalog-filtered functions endpoint, add RFC design document, and improve e2e tests#22
Copilot wants to merge 1 commit intomasterfrom
copilot/add-catalog-function-registry

Conversation

Copy link

Copilot AI commented Oct 8, 2025

  • Explored the repository structure and existing implementation
  • Verified that /v1/functions/{catalog} endpoint is already implemented in C++
  • Verified that getFunctionsMetadata(catalog) function exists and filters by catalog
  • Verified that NativeFunctionDefinitionProvider calls the catalog-filtered endpoint
  • Verified that tests exist for the catalog filtering functionality
  • Updated documentation to reflect the new /v1/functions/{catalog} endpoint
  • Verified the implementation is complete and working correctly
  • Reverted last commit as requested
  • Created comprehensive RFC design document
  • Updated RFC with proper Mermaid diagram
  • Clarified that all namespace managers use catalog-filtered endpoints
  • Improved e2e test cases with better naming and comprehensive coverage
  • Addressed all code review feedback
  • Moved hive function registration to proper location in PrestoServer
  • Fixed transaction error in catalog check
  • Added worker-side evaluation tests with column values

Summary

The implementation is complete with catalog-filtered functions endpoint at /v1/functions/{catalog}. All code review feedback has been addressed, including adding tests that properly evaluate functions on workers using column values instead of constants.

Key Changes

  1. Documentation: Added comprehensive RFC and updated sidecar documentation
  2. Test Improvements: Renamed and enhanced e2e tests with better coverage including worker-side evaluation tests
  3. Code Quality: Addressed all review feedback including proper separation of concerns and transaction-safe catalog checking
Original prompt

The native sidecar function registry should support registering functions under different catalogs and schemas, not just the built-in namespace. This would allow users to namespace their custom C++ functions separately from built-in functions, as one can do right now with a function namespace manager.

Presto Component, Service, or Connector
Native sidecar and plugin

Possible Implementation
Add catalog-filtered endpoint in C++: Implement a new endpoint /v1/functions/{catalog} in PrestoServer.cpp that filters functions by catalog, similar to the specification already defined in the OpenAPI spec.

Support multiple namespace registrations: Allow multiple native sidecar endpoints to be registered with different (non-built-in) namespaces. This capability may already be working without issue, but this requires verification.

Modify native sidecar plugin: Update the plugin to properly filter functions by catalog when fetching from the sidecar. Currently, all functions are bundled together regardless of catalog, which is incorrect.

Example Screenshots (if appropriate):
N/A

Context
To reach feature parity with Java, we need the ability to separately namespace functions.


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Copilot AI changed the title [WIP] Add support for catalog-based function registration Document catalog-filtered functions endpoint in native sidecar Oct 8, 2025
Copilot AI requested a review from Joe-Abraham October 8, 2025 06:38
@Joe-Abraham Joe-Abraham changed the base branch from hiveInitcap to master October 8, 2025 08:14
@Joe-Abraham
Copy link
Owner

@copilot

The /v1/functions/{catalog} path conflicts with the existing /v1/functions/{schema} route—consider renaming or ordering the handlers to avoid ambiguous routing.
The hard-coded blocklist in getFunctionsMetadata is scattered in the implementation—extract it into a shared constant or make it configurable to simplify future updates.

Copy link
Author

Copilot AI commented Oct 8, 2025

@copilot

The /v1/functions/{catalog} path conflicts with the existing /v1/functions/{schema} route—consider renaming or ordering the handlers to avoid ambiguous routing.
...

Fixed both issues:

  1. Path conflict resolved: Changed endpoint from /v1/functions/{catalog} to /v1/functions/catalog/{catalog} to avoid ambiguity with the existing /v1/functions/{schema} route. This makes the intent clear and prevents routing conflicts.

  2. Blocklist extracted: Moved the hard-coded blocklist to a shared constant kFunctionBlockList at namespace scope, eliminating duplication between getFunctionsMetadata() and getFunctionsMetadata(catalog).

Changes made in commit: Fix endpoint path conflict and extract blocklist constant

Copilot AI changed the title Document catalog-filtered functions endpoint in native sidecar Fix endpoint path conflict and refactor catalog-filtered functions endpoint Oct 8, 2025
@Joe-Abraham
Copy link
Owner

@copilot drop the last commit

Copy link
Author

Copilot AI commented Oct 8, 2025

@copilot drop the last commit

Reverted the last commit (f58b403). The changes to the endpoint path and blocklist extraction have been undone.

Copilot AI changed the title Fix endpoint path conflict and refactor catalog-filtered functions endpoint Document catalog-filtered functions endpoint in native sidecar Oct 8, 2025
@Joe-Abraham
Copy link
Owner

@copilot write an RFC document based on the above change as a design document the description of issue is as below

Expected Behavior or Use Case
The native sidecar function registry should support registering functions under different catalogs and schemas, not just the built-in namespace. This would allow users to namespace their custom C++ functions separately from built-in functions, as one can do right now with a function namespace manager.

Presto Component, Service, or Connector
Native sidecar and plugin

Possible Implementation
Add catalog-filtered endpoint in C++: Implement a new endpoint /v1/functions/{catalog} in PrestoServer.cpp that filters functions by catalog, similar to the specification already defined in the OpenAPI spec.

Support multiple namespace registrations: Allow multiple native sidecar endpoints to be registered with different (non-built-in) namespaces. This capability may already be working without issue, but this requires verification.

Modify native sidecar plugin: Update the plugin to properly filter functions by catalog when fetching from the sidecar. Currently, all functions are bundled together regardless of catalog, which is incorrect.

Example Screenshots (if appropriate):
N/A

Context
To reach feature parity with Java, we need the ability to separately namespace functions.

Copy link
Author

Copilot AI commented Oct 10, 2025

@copilot write an RFC document based on the above change as a design document the description of issue is as below

Expected Behavior or Use Case
...

Created RFC document at presto-native-execution/presto_cpp/docs/RFC-Catalog-Filtered-Functions.md. The document includes:

  • Problem statement and background
  • Architecture design with diagrams
  • Detailed implementation across C++ backend, filtering logic, and Java plugin
  • API specification aligned with OpenAPI spec
  • Usage examples and configuration
  • Security considerations
  • Performance analysis
  • Migration path and future enhancements

The RFC provides a comprehensive design document for the catalog-filtered functions feature in the native sidecar.

Copilot AI changed the title Document catalog-filtered functions endpoint in native sidecar Document catalog-filtered functions endpoint and add RFC design document Oct 10, 2025
@Joe-Abraham
Copy link
Owner

@sourcery-ai write the design document and how the feature is implemented.

@Joe-Abraham
Copy link
Owner

https://github.com/sourcery-ai write the design document and how the feature is implemented.

@Joe-Abraham
Copy link
Owner

@sourcery-ai generate the design document and how the feature is implemented.

@sourcery-ai
Copy link

sourcery-ai bot commented Oct 13, 2025

Here is a comprehensive design document and implementation summary for the catalog-filtered functions feature in the native sidecar function registry:


Catalog-Filtered Functions in Native Sidecar Function Registry

Overview

This feature enables namespace separation for C++ functions in Presto by allowing registration and querying of functions under different catalogs and schemas. It brings the native sidecar registry to feature parity with Java function namespace managers.

Problem Statement

Previously, all functions (built-in and custom) were bundled together, preventing proper isolation and organization. Users could not namespace custom C++ functions separately from built-ins, nor register multiple sidecar endpoints with different namespaces.

Design & Architecture

The solution introduces catalog-based filtering at three layers:

graph TD
    Coordinator[Presto Coordinator]
    NamespaceManager1[NativeFunctionNamespaceManager - Catalog: mycatalog]
    NamespaceManager2[NativeFunctionNamespaceManager - Catalog: presto]
    Sidecar[Presto Native Sidecar (C++ Worker)]
    EndpointAll[GET /v1/functions (all functions)]
    EndpointCatalog[GET /v1/functions/{catalog} (filtered)]
    Filter[getFunctionsMetadata(catalog) - C++ filtering]
    Registry[Velox Function Registry]

    Coordinator --> NamespaceManager1
    Coordinator --> NamespaceManager2
    NamespaceManager1 --> EndpointCatalog
    NamespaceManager2 --> EndpointCatalog
    EndpointCatalog --> Filter
    Filter --> Registry
Loading
  • C++ Backend: Adds /v1/functions/{catalog} endpoint in PrestoServer.cpp that delegates to getFunctionsMetadata(catalog).
  • Filtering Logic: In FunctionMetadata.cpp, functions are filtered by catalog prefix (from fully qualified name: catalog.schema.function_name).
  • Java Plugin: Each NativeFunctionNamespaceManager instance is bound to a catalog and queries only its catalog's functions from the sidecar.

API Specification

  • GET /v1/functions: Returns all functions (unfiltered).
  • GET /v1/functions/{catalog}: Returns only functions from the specified catalog.

Example response for /v1/functions/mycatalog:

{
  "my_custom_function": [
    {
      "outputType": "varchar",
      "paramTypes": ["varchar"],
      "schema": "myschema",
      "functionKind": "SCALAR",
      "routineCharacteristics": {
        "language": {"languageName": "CPP"},
        "determinism": "DETERMINISTIC",
        "nullCallClause": "RETURNS_NULL_ON_NULL_INPUT"
      }
    }
  ]
}

Implementation Details

  • Functions are registered in Velox as catalog.schema.function_name.
  • Filtering is performed by splitting the function name and matching the catalog.
  • The Java plugin does not fall back to the unfiltered endpoint, preventing cross-catalog leakage.
  • Robust error handling and test coverage are included.

Usage Example

Configure multiple function namespace managers:

# etc/function-namespace/mycatalog.properties
function-namespace-manager.name=native
function-implementation-type=CPP
supported-function-languages=CPP

Register a function in C++:

registerFunction<MyCustomFunction, Varchar, Varchar>({"mycatalog.myschema.my_custom_function"});

Security & Performance

  • Prevents cross-catalog leakage by strict filtering and error handling.
  • Minimal performance impact (O(n) filtering, reduced network payload).

Migration & Compatibility

  • Fully backward compatible; existing endpoints and managers continue to work.
  • No migration required; opt-in by using the new endpoint and registering functions with catalog-qualified names.

Future Enhancements

  • Schema-level filtering, configurable blocklist, individual function lookup, caching, and metrics.

If you need a markdown copy of this document or want a more detailed breakdown of any section, let me know!

@Joe-Abraham
Copy link
Owner

@sourcery-ai generate the design document and how the feature is implemented.

@sourcery-ai
Copy link

sourcery-ai bot commented Oct 13, 2025

Certainly! Here is a markdown-formatted design document summarizing the catalog-filtered functions feature and its implementation in the native sidecar function registry:


Catalog-Filtered Functions in Native Sidecar Function Registry

Overview

This feature enables namespace separation for C++ functions in Presto by allowing registration and querying of functions under different catalogs and schemas. It brings the native sidecar registry to feature parity with Java function namespace managers.

Problem Statement

Previously, all functions (built-in and custom) were bundled together, preventing proper isolation and organization. Users could not namespace custom C++ functions separately from built-ins, nor register multiple sidecar endpoints with different namespaces.

Design & Architecture

The solution introduces catalog-based filtering at three layers:

graph TD
    Coordinator[Presto Coordinator]
    NamespaceManager1[NativeFunctionNamespaceManager - Catalog: mycatalog]
    NamespaceManager2[NativeFunctionNamespaceManager - Catalog: presto]
    Sidecar[Presto Native Sidecar - C++ Worker]
    EndpointAll[GET /v1/functions (all functions)]
    EndpointCatalog[GET /v1/functions/{catalog} (filtered)]
    Filter[getFunctionsMetadata - C++ filtering]
    Registry[Velox Function Registry]

    Coordinator --> NamespaceManager1
    Coordinator --> NamespaceManager2
    NamespaceManager1 --> EndpointCatalog
    NamespaceManager2 --> EndpointCatalog
    EndpointCatalog --> Filter
    Filter --> Registry
Loading
  • C++ Backend: Adds /v1/functions/{catalog} endpoint in PrestoServer.cpp that delegates to getFunctionsMetadata(catalog).
  • Filtering Logic: In FunctionMetadata.cpp, functions are filtered by catalog prefix (from fully qualified name: catalog.schema.function_name).
  • Java Plugin: Each NativeFunctionNamespaceManager instance is bound to a catalog and queries only its catalog's functions from the sidecar.

API Specification

  • GET /v1/functions: Returns all functions (unfiltered).
  • GET /v1/functions/{catalog}: Returns only functions from the specified catalog.

Example response for /v1/functions/mycatalog:

{
  "my_custom_function": [
    {
      "outputType": "varchar",
      "paramTypes": ["varchar"],
      "schema": "myschema",
      "functionKind": "SCALAR",
      "routineCharacteristics": {
        "language": {"languageName": "CPP"},
        "determinism": "DETERMINISTIC",
        "nullCallClause": "RETURNS_NULL_ON_NULL_INPUT"
      }
    }
  ]
}

Implementation Details

  • Functions are registered in Velox as catalog.schema.function_name.
  • Filtering is performed by splitting the function name and matching the catalog.
  • The Java plugin does not fall back to the unfiltered endpoint, preventing cross-catalog leakage.
  • Robust error handling and test coverage are included.

Usage Example

Configure multiple function namespace managers:

# etc/function-namespace/mycatalog.properties
function-namespace-manager.name=native
function-implementation-type=CPP
supported-function-languages=CPP

Register a function in C++:

registerFunction<MyCustomFunction, Varchar, Varchar>({"mycatalog.myschema.my_custom_function"});

Security & Performance

  • Prevents cross-catalog leakage by strict filtering and error handling.
  • Minimal performance impact (O(n) filtering, reduced network payload).

Migration & Compatibility

  • Fully backward compatible; existing endpoints and managers continue to work.
  • No migration required; opt-in by using the new endpoint and registering functions with catalog-qualified names.

Future Enhancements

  • Schema-level filtering, configurable blocklist, individual function lookup, caching, and metrics.

If you need further details or want this document in a different format, let me know!

@Joe-Abraham
Copy link
Owner

@sourcery-ai the diagram is not correct

graph TD
Coordinator[Presto Coordinator]
NamespaceManager1[NativeFunctionNamespaceManager - Catalog: mycatalog]
NamespaceManager2[NativeFunctionNamespaceManager - Catalog: presto]
Sidecar[Presto Native Sidecar - C++ Worker]
EndpointAll["GET /v1/functions (all functions)"]
EndpointCatalog["GET /v1/functions/prestodb#123;catalog#125; (filtered)"]
Filter[getFunctionsMetadata - C++ filtering]
Registry[Velox Function Registry]

Coordinator --> NamespaceManager1
Coordinator --> NamespaceManager2
NamespaceManager1 --> EndpointCatalog
NamespaceManager2 --> EndpointAll
EndpointCatalog --> Filter
Filter --> Registry

make the diagram proper based on the implementation on this PR

@Joe-Abraham
Copy link
Owner

@copilot create a diagram with graphTD based on the current implementation, the below is the starter code

graph TD
Coordinator[Presto Coordinator]
NamespaceManager1[NativeFunctionNamespaceManager - Catalog: mycatalog]
NamespaceManager2[NativeFunctionNamespaceManager - Catalog: presto]
Sidecar[Presto Native Sidecar - C++ Worker]
EndpointAll["GET /v1/functions (all functions)"]
EndpointCatalog["GET /v1/functions/prestodb#123;catalog#125; (filtered)"]
Filter[getFunctionsMetadata - C++ filtering]
Registry[Velox Function Registry]

Coordinator --> NamespaceManager1
Coordinator --> NamespaceManager2
NamespaceManager1 --> EndpointCatalog
NamespaceManager2 --> EndpointAll
EndpointCatalog --> Filter
Filter --> Registry

@Joe-Abraham
Copy link
Owner

@copilot add more testcases and change the file names

Copy link
Author

Copilot AI commented Oct 24, 2025

@copilot add more testcases and change the file names

Done! I've improved the e2e test cases and renamed the files. Changes in commit bb9f328:

File Renamings:

  • TestNativeFunctionNamespacesWithoutSidecarTestCatalogFilteredFunctionsWithoutSidecar
  • TestNativeSidecarFunctionNamespacesTestCatalogFilteredFunctionNamespaces

New Test File:

  • TestCatalogFunctionIsolation - Tests catalog isolation and prevents cross-catalog function leakage

Enhanced Test Coverage:

  1. TestCatalogFilteredFunctionsWithoutSidecar - Added tests for multiple catalogs (hive, native, custom)
  2. TestCatalogFilteredFunctionNamespaces - Added 6 new test methods covering:
    • Various inputs (empty strings, numbers, special characters, null values)
    • Complex queries (WHERE clauses, joins, subqueries)
  3. TestCatalogFunctionIsolation - 8 new tests for:
    • Unregistered catalog access prevention
    • Wrong catalog name failures
    • Qualified function name requirements
    • Multiple catalog functions in same query
    • Catalog isolation in subqueries and joins

All test files now have detailed Javadoc comments explaining what they test.

Copilot AI changed the title Document catalog-filtered functions endpoint and add RFC design document Document catalog-filtered functions endpoint, add RFC design document, and improve e2e tests Oct 24, 2025
"supported-function-languages", "CPP",
"function-implementation-type", "CPP"));

// Register hive catalog for hive-specific functions
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot This assumes the PrestoServer has hive connector. Thats a reasonable assumption as we always setup hive connector. But then we should make the C++ PrestoServer code consistent by always registering the hive function as well (and not check if the hive connector is present)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added clarifying comment explaining that the C++ PrestoServer conditionally registers hive functions only when a hive connector is present (see PrestoToVeloxConnector.cpp:60-63). Since tests always setup the hive connector, the functions will be available. This maintains consistency with the actual C++ implementation behavior. (commit: in progress)


TEST_F(FunctionMetadataTest, GetFunctionsMetadataWithNonExistentCatalog) {
// Test with a catalog that doesn't exist
std::string catalog = "nonexistent";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Remove this temporary variable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the temporary variable. The string literal is now passed directly to getFunctionsMetadata(). (commit: in progress)


// Register hive catalog for hive-specific functions.
// Note: The C++ PrestoServer registers hive functions only when a hive connector is present.
// Since tests always setup the hive connector, hive functions will be available.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot add a check if the hive connector is present

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check using getCatalogHandle() to verify hive connector presence before registering the hive function namespace manager. This maintains consistency with C++ PrestoServer behavior. Changes in commit b611ec5.

Comment on lines 58 to 63
// Register hive-specific functions when hive catalog is detected.
// Delegate to generic Hive native function registrar which is idempotent.
if (connectorName ==
velox::connector::hive::HiveConnectorFactory::kHiveConnectorName ||
connectorName == std::string("hive-hadoop2")) {
hive::functions::registerHiveNativeFunctions();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot This code shouldn't be part of the PrestoToVeloxConnector protocol conversion registration. We should add them in registerFunctions if the hive connector is present https://github.com/prestodb/presto/blob/master/presto-native-execution/presto_cpp/main/PrestoServer.cpp#L1353

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved hive function registration from PrestoToVeloxConnector::registerConnector to PrestoServer::registerFunctions. The functions are now registered when registerFunctions is called if the hive connector factory is available. This properly separates connector protocol registration from function registration. (commit: pending)


// Register hive catalog for hive-specific functions only if hive connector is present.
// Note: The C++ PrestoServer registers hive functions only when a hive connector is present.
if (queryRunner.getMetadata().getCatalogHandle(queryRunner.getDefaultSession(), "hive").isPresent()) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot this is failing with below error

java.lang.IllegalStateException: Not in a transaction
	at com.google.common.base.Preconditions.checkState(Preconditions.java:512)
	at com.facebook.presto.Session.getRequiredTransactionId(Session.java:266)
	at com.facebook.presto.metadata.MetadataManager.getCatalogHandle(MetadataManager.java:983)
	at com.facebook.presto.sidecar.NativeSidecarPluginQueryRunnerUtils.setupNativeSidecarPlugin(NativeSidecarPluginQueryRunnerUtils.java:44)
	at com.facebook.presto.sidecar.NativeSidecarPluginQueryRunner.main(NativeSidecarPluginQueryRunner.java:46)

@Joe-Abraham Joe-Abraham force-pushed the copilot/add-catalog-function-registry branch from 6d26453 to e716dc1 Compare October 27, 2025 07:00
assertQuery("SELECT hive.default.initcap('x')", "SELECT('X')");
assertQuery("SELECT hive.default.initcap('hello123world')", "SELECT('Hello123world')");
assertQuery("SELECT hive.default.initcap('hello-world')", "SELECT('Hello-world')");
assertQuery("SELECT hive.default.initcap(NULL)", "SELECT CAST(NULL AS VARCHAR)");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot All the tests in this class are using initcap with constant values... Due to constant folding optimization all these will be evavluated on the co-ordinator. You should invoke initcap with a column of values for evaluation on the worker. Please add such tests.

@Joe-Abraham Joe-Abraham force-pushed the copilot/add-catalog-function-registry branch from e716dc1 to 4347479 Compare October 27, 2025 07:02
Copilot AI requested a review from Joe-Abraham October 27, 2025 07:06
@Joe-Abraham Joe-Abraham deleted the copilot/add-catalog-function-registry branch October 28, 2025 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants