feat(native): Add cuDF session property provider#27204
feat(native): Add cuDF session property provider#27204pramodsatya wants to merge 3 commits intoprestodb:masterfrom
Conversation
Reviewer's GuideRefactors native session property handling into a reusable provider framework, relocates session property code into a dedicated module, and adds cuDF-specific session properties that are surfaced via the native sidecar endpoint when GPU support is enabled, along with tests and CI wiring. Sequence diagram for sidecar sessionProperties endpoint with optional cuDF supportsequenceDiagram
actor JavaCoordinator
participant SidecarPlugin
participant NativeWorker
participant PrestoServer
participant SessionProps as SessionPropertiesProvider
JavaCoordinator->>SidecarPlugin: request session properties
SidecarPlugin->>NativeWorker: HTTP GET /v1/sessionProperties
NativeWorker->>PrestoServer: dispatch request
alt PRESTO_ENABLE_CUDF defined
PrestoServer->>SessionProps: CudfSessionProperties::instance()
else
PrestoServer->>SessionProps: SessionProperties::instance()
end
SessionProps-->>PrestoServer: serialize()
PrestoServer-->>NativeWorker: JSON session properties
NativeWorker-->>SidecarPlugin: HTTP 200 OK
SidecarPlugin-->>JavaCoordinator: native session properties metadata
Class diagram for session properties provider framework and cuDF session propertiesclassDiagram
class SessionProperty {
-protocol_SessionPropertyMetadata metadata_
-optional_string veloxConfig_
-string value_
+SessionProperty(name, description, typeSignature, hidden, veloxConfig, defaultValue)
+protocol_SessionPropertyMetadata getMetadata()
+optional_string getVeloxConfig()
+string getValue()
+void updateValue(value)
+bool operator==(other)
}
class SessionPropertiesProvider {
<<abstract>>
-unordered_map_string_SessionPropertyPtr sessionProperties_
+~SessionPropertiesProvider()
+string toVeloxConfig(name) const
+json serialize() const
+bool hasVeloxConfig(key)
+void updateSessionPropertyValue(key, value)
#void addSessionProperty(name, description, type, isHidden, veloxConfig, defaultValue)
}
class SessionProperties {
<<singleton>>
+static SessionProperties* instance()
+SessionProperties()
+bool useVeloxGeospatialJoin() const
}
class CudfSessionProperties {
<<singleton>>
+static CudfSessionProperties* instance()
+CudfSessionProperties()
+static const char* kCudfEnabled
+static const char* kCudfDebugEnabled
+static const char* kCudfMemoryResource
+static const char* kCudfMemoryPercent
+static const char* kCudfFunctionNamePrefix
+static const char* kCudfAstExpressionEnabled
+static const char* kCudfAstExpressionPriority
+static const char* kCudfJitExpressionEnabled
+static const char* kCudfJitExpressionPriority
+static const char* kCudfAllowCpuFallback
+static const char* kCudfLogFallback
}
class CudfConfig {
<<external>>
+static CudfConfig& getInstance()
+bool enabled
+bool debugEnabled
+string memoryResource
+int memoryPercent
+string functionNamePrefix
+bool astExpressionEnabled
+int astExpressionPriority
+bool jitExpressionEnabled
+int jitExpressionPriority
+bool allowCpuFallback
+bool logFallback
}
SessionPropertiesProvider o--> SessionProperty
SessionProperties --|> SessionPropertiesProvider
CudfSessionProperties --|> SessionPropertiesProvider
CudfSessionProperties ..> CudfConfig
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
3499943 to
4cb6d41
Compare
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In PrestoServer::registerSidecarEndpoints, when PRESTO_ENABLE_CUDF is set you completely replace the native SessionProperties with CudfSessionProperties; consider merging/combining the two so the sidecar exposes both standard native and cuDF session properties instead of dropping the former when GPUs are enabled.
- The CMake setup builds CudfSessionProperties.cpp into both presto_session_properties and presto_cudf_session_properties, and the CUDF tests link both libraries; this likely produces duplicate symbol/ODR issues, so it would be cleaner to have CudfSessionProperties compiled into exactly one library and have the other target depend on it rather than re-listing the same sources.
- CudfSessionProperties defines kCudfTopNBatchSize but never registers it in the constructor or validates it in tests, which makes the constant dead and potentially misleading; either wire it up as an actual session property or remove it until it is going to be used.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In PrestoServer::registerSidecarEndpoints, when PRESTO_ENABLE_CUDF is set you completely replace the native SessionProperties with CudfSessionProperties; consider merging/combining the two so the sidecar exposes both standard native and cuDF session properties instead of dropping the former when GPUs are enabled.
- The CMake setup builds CudfSessionProperties.cpp into both presto_session_properties and presto_cudf_session_properties, and the CUDF tests link both libraries; this likely produces duplicate symbol/ODR issues, so it would be cleaner to have CudfSessionProperties compiled into exactly one library and have the other target depend on it rather than re-listing the same sources.
- CudfSessionProperties defines kCudfTopNBatchSize but never registers it in the constructor or validates it in tests, which makes the constant dead and potentially misleading; either wire it up as an actual session property or remove it until it is going to be used.
## Individual Comments
### Comment 1
<location path="presto-native-execution/presto_cpp/main/properties/session/CudfSessionProperties.h" line_range="65-72" />
<code_context>
+ /// Log reasons for CPU fallback
+ static constexpr const char* kCudfLogFallback = "cudf_log_fallback";
+
+ /// Batch size used by cuDF TopN operator before merging partial results
+ static constexpr const char* kCudfTopNBatchSize = "cudf_topn_batch_size";
+
+ /// Get singleton instance
</code_context>
<issue_to_address>
**suggestion (bug_risk):** kCudfTopNBatchSize is declared but not initialized in CudfSessionProperties, which may indicate a missing property registration.
This constant is defined but never registered via addSessionProperty in the CudfSessionProperties constructor. If TopN batch size should be session-configurable, please hook it up there like the other properties. If not, consider removing it to avoid implying a configurable setting that doesn’t exist.
```suggestion
/// Log reasons for CPU fallback
static constexpr const char* kCudfLogFallback = "cudf_log_fallback";
/// Get singleton instance
static CudfSessionProperties* instance();
```
</issue_to_address>
### Comment 2
<location path="presto-native-execution/presto_cpp/main/properties/session/tests/CudfSessionPropertiesTest.cpp" line_range="63-70" />
<code_context>
+ const auto sessionProps = CudfSessionProperties::instance();
+ const auto& serialized = sessionProps->serialize();
+
+ // Helper to find property value by name
+ auto findPropertyValue = [&serialized](const std::string& name) {
+ for (const auto& prop : serialized) {
+ if (prop["name"] == name) {
+ return prop["defaultValue"].get<std::string>();
+ }
+ }
+ return std::string();
+ };
+
</code_context>
<issue_to_address>
**suggestion (testing):** Tighten `defaultValuesMatchConfig` to assert properties exist before comparing default values
In `defaultValuesMatchConfig`, `findPropertyValue` returns an empty string when a property is missing, so failures appear as default mismatches rather than clearly indicating a missing/misnamed property. Adding an explicit assertion (e.g., `ASSERT_FALSE(result.empty())`) for expected properties would make test failures clearly signal that the property wasn’t found.
Suggested implementation:
```cpp
#include <optional>
#include "presto_cpp/main/properties/session/CudfSessionProperties.h"
#include "velox/experimental/cudf/CudfConfig.h"
```
```cpp
const auto sessionProps = CudfSessionProperties::instance();
const auto& serialized = sessionProps->serialize();
// Helper to find property value by name
auto findPropertyValue =
[&serialized](const std::string& name) -> std::optional<std::string> {
for (const auto& prop : serialized) {
if (prop["name"] == name) {
return prop["defaultValue"].get<std::string>();
}
}
return std::nullopt;
};
```
To fully implement the suggestion “assert properties exist before comparing default values”, you should update the code *inside* `defaultValuesMatchConfig` wherever `findPropertyValue` is used. For each expected property:
1. Replace direct uses like:
```c++
const auto actualDefault = findPropertyValue(expectedName);
EXPECT_EQ(expectedDefault, actualDefault);
```
with:
```c++
const auto actualDefault = findPropertyValue(expectedName);
ASSERT_TRUE(actualDefault.has_value())
<< "Property " << expectedName << " not found in session properties";
EXPECT_EQ(expectedDefault, *actualDefault);
```
2. Ensure this pattern is applied for every expected property the test checks, so a missing/misnamed property causes the `ASSERT_TRUE` to fire with a clear “not found” message instead of being reported as a value mismatch.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-native-execution/presto_cpp/main/properties/session/CudfSessionProperties.h
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/properties/session/tests/CudfSessionPropertiesTest.cpp
Outdated
Show resolved
Hide resolved
|
@majetideepak, could you help review this change? This is 1/3 changes for cuDF sidecar support and adds cuDF session property provider. cuDF functions and plan validation will be added in subsequent PRs (ref: #27164) |
|
@sourcery-ai review. |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
CudfSessionPropertiesTest.propertiesInitializedtest assertsserialized.size() == 11, which will make the test fragile when adding or removing CUDF session properties; consider checking only for the presence of the expected property names (as you already do) and dropping the hard-coded count. SessionPropertiesProvider.cppis compiled into bothpresto_session_propertiesandpresto_cudf_session_properties; to avoid duplicate compilation and potential ODR/linking surprises, consider factoring it into a separate common library that both targets link against instead of compiling the same source into two libraries.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `CudfSessionPropertiesTest.propertiesInitialized` test asserts `serialized.size() == 11`, which will make the test fragile when adding or removing CUDF session properties; consider checking only for the presence of the expected property names (as you already do) and dropping the hard-coded count.
- `SessionPropertiesProvider.cpp` is compiled into both `presto_session_properties` and `presto_cudf_session_properties`; to avoid duplicate compilation and potential ODR/linking surprises, consider factoring it into a separate common library that both targets link against instead of compiling the same source into two libraries.
## Individual Comments
### Comment 1
<location path="presto-native-execution/presto_cpp/main/PrestoServer.cpp" line_range="79-83" />
<code_context>
#include "velox/serializers/UnsafeRowSerializer.h"
#ifdef PRESTO_ENABLE_CUDF
+#include "presto_cpp/main/properties/session/CudfSessionProperties.h"
#include "velox/experimental/cudf/CudfConfig.h"
#include "velox/experimental/cudf/exec/ToCudf.h"
#endif
</code_context>
<issue_to_address>
**issue (bug_risk):** Sidecar /v1/native/session/properties now returns only cuDF properties when cuDF is enabled, dropping the existing native session properties.
With PRESTO_ENABLE_CUDF enabled, this endpoint now serializes only `CudfSessionProperties` instead of the full `SessionProperties`, so `/v1/native/session/properties` no longer returns the complete native worker properties. If callers rely on the full set, this is a behavioral regression.
Please either merge both `SessionProperties` and `CudfSessionProperties` into one response, or keep the original payload here and surface cuDF session properties via a separate or aggregating provider, so enabling cuDF doesn’t change the existing API contract unexpectedly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
7cf904b to
c99e19a
Compare
…le (#16528) Summary: Merge cuDF query-level configuration from `CudfQueryConfig.h` into `CudfConfig.h` to establish a unified configuration model for all cuDF settings. This centralizes config initialization, declaration, and access patterns in a single structural anchor. ### Motivation Centralizing cuDF configuration into one file simplifies the architecture by: - Eliminating fragmented configuration declarations across multiple headers - Consolidating config initialization logic in a single place - Establishing a clear, single source of truth for all cuDF parameters - Making it straightforward for higher-level systems (like Presto) to discover and expose cuDF settings (see prestodb/presto#27204). Pull Request resolved: #16528 Reviewed By: jainxrohit Differential Revision: D94395708 Pulled By: kKPulla fbshipit-source-id: 54e4f397cd77e38069d90ce75bcf1de30eaaa5f3
e285ee9 to
ea16adf
Compare
ea16adf to
9c2abad
Compare
## Description Refactors session properties provider into a base class. ## Motivation and Context Different native execution backends like `cuDF` can provide their own mapping of Presto to Velox configs in order to integrate Velox configs into Presto with the native session property provider ([ref](https://prestodb.io/docs/current/plugin/native-sidecar-plugin.html#session-properties)). Required by #27204. ## Impact N/A. ## Test Plan Verified with existing e2e tests. ``` == NO RELEASE NOTE == ``` ## Summary by Sourcery Refactor native session properties into a reusable provider base class and reorganize related code into a dedicated properties module. Enhancements: - Introduce a SessionPropertiesProvider base class encapsulating shared session property management, serialization, and Velox config mapping logic for reuse by different backends. - Update existing SessionProperties to inherit from the new provider and relocate session property code under a dedicated properties/session directory. - Extract a common bool-to-string utility into the shared Utils module for reuse across components. Build: - Create a new presto_session_properties library, adjust main CMake configuration to use it, and add properties-specific CMake targets. Tests: - Move native session properties tests into a dedicated properties/session test target and wire them into CMake.
…le (facebookincubator#16528) Summary: Merge cuDF query-level configuration from `CudfQueryConfig.h` into `CudfConfig.h` to establish a unified configuration model for all cuDF settings. This centralizes config initialization, declaration, and access patterns in a single structural anchor. ### Motivation Centralizing cuDF configuration into one file simplifies the architecture by: - Eliminating fragmented configuration declarations across multiple headers - Consolidating config initialization logic in a single place - Establishing a clear, single source of truth for all cuDF parameters - Making it straightforward for higher-level systems (like Presto) to discover and expose cuDF settings (see prestodb/presto#27204). Pull Request resolved: facebookincubator#16528 Reviewed By: jainxrohit Differential Revision: D94395708 Pulled By: kKPulla fbshipit-source-id: 54e4f397cd77e38069d90ce75bcf1de30eaaa5f3
Description
This change makes CUDF configurations in Velox configurable as Presto session properties by leveraging the session property provider from the native sidecar plugin. This eliminates the need to restart workers when updating CUDF configurations - they can now be applied dynamically via session properties on a per-query basis.
Implementation details:
CudfSessionPropertiesclass that exposes cuDF configurations as Presto session propertiesPRESTO_ENABLE_CUDF=ON)Depends on #27253.
Depends on facebookincubator/velox#16535.
Motivation and Context
CUDF configuration previously required:
This approach made it difficult to:
By exposing these as Presto session properties, developers and operators can now:
SET SESSIONImpact
Public API Changes:
Compatibility:
Test Plan
Unit Tests:
CudfSessionPropertiesclassIntegration Tests:
TestCudfSidecarPluginthat validates:SET SESSIONcommandsCudfConfigduring query executionCI Testing:
Manual Testing:
SHOW SESSIONcorrectly displays all CUDF session propertiesContributor checklist
Release Notes
Summary by Sourcery
Expose cuDF GPU configuration as native session properties and route the sidecar session property endpoint through a pluggable session-properties provider, with optional cuDF-specific implementation when GPU support is enabled.
New Features:
Enhancements:
Build:
CI:
Tests: