refactor(cuDF): Separate cuDF query and system configs#16535
refactor(cuDF): Separate cuDF query and system configs#16535pramodsatya wants to merge 1 commit intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
ed7ddbb to
c639cc5
Compare
d2cf7b3 to
d7489a6
Compare
| /// Clients can disable here and enable it via the QueryConfig as well. | ||
| bool enabled{true}; | ||
| /// Retrieve all config keys currently in the map. | ||
| std::vector<std::string> getConfigKeys() const; |
There was a problem hiding this comment.
When do you need the keys?
There was a problem hiding this comment.
These velox config keys are mapped to Presto session properties in prestodb/presto#27204.
| std::string memoryResource{"async"}; | ||
| /// Retrieve config value by key, fails if config does not exist. | ||
| template <typename T> | ||
| T get(const std::string& key) const { |
There was a problem hiding this comment.
remove since super class has this API
There was a problem hiding this comment.
super class does not have the same API, could you please clarify?
I found super class API that also expects a default value to be provided, this overload simplifies get and can be used when we are certain the config exists (such as in testcases) and get will not return std::nullopt.
There was a problem hiding this comment.
I think we should follow the existing config usage pattern in S3, Config.h, etc. to keep configs consistent.
| @@ -398,7 +398,7 @@ CudfHashJoinProbe::CudfHashJoinProbe( | |||
| "Join field {} not in probe or build input", outputType->children()[i]); | |||
| } | |||
|
|
|||
| if (CudfConfig::getInstance().debugEnabled) { | |||
| if (CudfConfig::getInstance().get<bool>(CudfConfig::kCudfDebugEnabled)) { | |||
There was a problem hiding this comment.
But I wonder if we should use as this, as the QueryConfig, it has per function for per config and return the specified data type
There was a problem hiding this comment.
I feel this is cleaner and having a separate getter method for each config is a maintenance overhead. I further changed the config entries to be of type CudfConfigEntry for better config metadata maintenance.
Could you please take another look @jinchengchenghh and share your thoughts on this?
The metadata fields added to CudfConfigEntry are required for prestodb/presto#27204.
d7489a6 to
6af02d4
Compare
velox/common/config/Config.h
Outdated
| @@ -118,6 +119,12 @@ class ConfigBase : public IConfig { | |||
|
|
|||
| const std::unordered_map<std::string, std::string>& rawConfigs() const; | |||
|
|
|||
| /// Get the TypePtr for a config key. Subclasses can override to provide type | |||
| /// info. | |||
| virtual velox::TypePtr getType(const std::string& key) const { | |||
| /// Represents a single configuration entry with its metadata. | ||
| struct CudfConfigEntry { | ||
| std::string name; // Config key (e.g., "cudf.enabled") | ||
| velox::TypePtr type; // Type of the configuration value |
There was a problem hiding this comment.
Where do we use this type field?
| std::string memoryResource{"async"}; | ||
| /// Retrieve config value by key, fails if config does not exist. | ||
| template <typename T> | ||
| T get(const std::string& key) const { |
There was a problem hiding this comment.
I think we should follow the existing config usage pattern in S3, Config.h, etc. to keep configs consistent.
| @@ -255,7 +256,8 @@ RowVectorPtr CudfFilterProject::getOutput() { | |||
| stream.synchronize(); | |||
| auto const numColumns = outputTable->num_columns(); | |||
| auto const size = outputTable->num_rows(); | |||
| if (CudfConfig::getInstance().debugEnabled) { | |||
| if (CudfConfig::getInstance().get<bool>( | |||
There was a problem hiding this comment.
This should be wrapped in a common function where we check if the session property exists and return that or the core config.
8277308 to
d90088f
Compare
|
Thanks for the feedback @majetideepak, addressed the review comments and updated the PR to separate out cuDF system and query level configs as discussed. The cuDF query configs are now passed via |
| topNNode_(topNNode), | ||
| kBatchSize_(CudfConfig::getInstance().topNBatchSize), | ||
| kBatchSize_( | ||
| operatorCtx_->execCtx()->queryCtx()->cudfConfig().topNBatchSize()), |
There was a problem hiding this comment.
This should be
topNBatchSize(operatorCtx_->execCtx()->queryCtx());
Inside topNBatchSize(), we check if the queryCtx has the session key cudf_topn_batch_size" or use the system config.
There was a problem hiding this comment.
Updated, thanks.
| /// | ||
| /// Example: "cudf.allow_cpu_fallback" and "cudf.allow-cpu-fallback" are | ||
| /// normalized to canonical internal key "cudf.allow-cpu-fallback". | ||
| void updateConfigs(std::unordered_map<std::string, std::string>&&); |
| CudfSystemConfig(); | ||
|
|
||
| /// Constructor that initializes config from a map of values. | ||
| explicit CudfSystemConfig( |
| return instance; | ||
| } | ||
|
|
||
| void CudfSystemConfig::validateConfigs() { |
There was a problem hiding this comment.
remove this function and add the body to updateConfigs.
| void CudfSystemConfig::updateConfigs( | ||
| std::unordered_map<std::string, std::string>&& config) { | ||
| for (auto& [key, value] : config) { | ||
| // TODO(ps): Revert support for legacy config names. |
There was a problem hiding this comment.
Add an example of legacy config.
There was a problem hiding this comment.
Throw a warning for legacy config?
| auto const& geLower = tree.push(Operation{Op::GREATER_EQUAL, value, lower}); | ||
| auto const& leUpper = tree.push(Operation{Op::LESS_EQUAL, value, upper}); | ||
| return tree.push(Operation{Op::NULL_LOGICAL_AND, geLower, leUpper}); | ||
| VELOX_CHECK_EQ(len, 3); |
majetideepak
left a comment
There was a problem hiding this comment.
few nits and cleanup pending.
Separates out cuDF configs into system and query level configs. cuDF query configs are moved to
core/QueryConfigand accessed throughqueryCtx, avoiding the use of singleton instance and allowing for different configs to be propagated in different sessions via Presto C++.Extracts
CudfConfiginto a dedicatedvelox_cudf_configlibrary within a newcommon/subdirectory to enable external engines to consume the config module independently of exec operators.Key Changes
CudfSystemConfigand cuDFQueryConfigsCudfSystemConfig, while retaining backwards compatibility for underscored config namesCudfConfigAPI: Replace specialized template methods with two generic getters (get<T>(),getOptional<T>()) with automatic type conversionvelox_cudf_configlibraryBenefits
Enables other engines to set cuDF
QueryConfigs cleanly, such as via session properties in Presto, and without an exec operator dependency.