Skip to content

Conversation

@nuno-faria
Copy link
Contributor

Which issue does this PR close?

  • None

Rationale for this change

The documentation for the datafusion.execution.target_partitions and datafusion.execution.planning_concurrency settings states that their default values are set to 0 (which in these cases maps to the number of cores). However, using 0 does not translate to their default values, but rather the literal 0, which is invalid in both cases.

In a previous version, setting datafusion.execution.target_partitions to 0 resulted in a single partition, while in the current version it results in a division by zero error. Likewise, setting datafusion.execution.planning_concurrency to 0 causes the system to hang.

With this change, setting datafusion.execution.target_partitions or datafusion.execution.planning_concurrency to 0 will result in the default value being assigned instead, making it coherent with the documentation.

What changes are included in this PR?

  • datafusion/common/config - added the translation to the default values.
  • datafusion/execution/config - updated the with_target_partitions method.
  • sqllogictest/information_schema - added tests.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes -- when setting these configurations to 0 -- but it will match the existing documentation.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate execution Related to the execution crate labels Apr 14, 2025
@ctsk
Copy link
Contributor

ctsk commented Apr 15, 2025

Sensible changes, but I'm unsure of the implementation. I've had a quick look at the config_namespace! macro and see that there is a transform can be specified - Could that be used here?

@nuno-faria
Copy link
Contributor Author

Sensible changes, but I'm unsure of the implementation. I've had a quick look at the config_namespace! macro and see that there is a transform can be specified - Could that be used here?

@ctsk Something like this?

pub target_partitions: usize, transform = |x: &str| if x == "0" ...

It appears to work well, however with_target_partitions must be changed to call set instead of assigning directly to execution.target_partitions.

Should transform = ... be a closure or a new standalone function?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @nuno-faria and @ctsk for getting the PR to a nicely mergable state

@alamb
Copy link
Contributor

alamb commented Apr 15, 2025

@alamb
Copy link
Contributor

alamb commented Apr 15, 2025

Merged up from main to try and get a clean CI rnu

@alamb
Copy link
Contributor

alamb commented Apr 16, 2025

Thanks again @nuno-faria -- FYI this will be in the 48 release

@alamb alamb merged commit a9a1319 into apache:main Apr 16, 2025
27 checks passed
nirnayroy pushed a commit to nirnayroy/datafusion that referenced this pull request May 2, 2025
…urrency (apache#15712)

* Enable setting default values for target_partitions and planning_concurrency

* Fix doc test

* Use transform to apply the mapping from 0 to the default parallelism

---------

Co-authored-by: Andrew Lamb <[email protected]>
@nuno-faria nuno-faria deleted the config_defaults branch September 17, 2025 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate execution Related to the execution crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants