forked from apache/datafusion
-
Notifications
You must be signed in to change notification settings - Fork 2
[QDTP-791] Sync to upstream to bring the new topK optim changes from Geoffrey #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…urrency (apache#15712) * Enable setting default values for target_partitions and planning_concurrency * Fix doc test * Use transform to apply the mapping from 0 to the default parallelism --------- Co-authored-by: Andrew Lamb <[email protected]>
* minor * fix
Bumps [http-proxy-middleware](https://github.com/chimurai/http-proxy-middleware) from 2.0.6 to 2.0.9. - [Release notes](https://github.com/chimurai/http-proxy-middleware/releases) - [Changelog](https://github.com/chimurai/http-proxy-middleware/blob/v2.0.9/CHANGELOG.md) - [Commits](chimurai/http-proxy-middleware@v2.0.6...v2.0.9) --- updated-dependencies: - dependency-name: http-proxy-middleware dependency-version: 2.0.9 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* ParserError->DataFusionError+attach a diagnostic * fix: ci * fix: fmt * fix:clippy * does this fix ci test? * this fixes sqllogictest * fix: cargo test * fix: fmt * add tests * cleanup * suggestions + expect EOF nicely * fix: clippy
…5594) * Set DataFusion runtime configurations through SQL interface * fix clippy warnings * use spill count based tests for checking applied memory limit --------- Co-authored-by: Andrew Lamb <[email protected]>
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.171 to 0.2.172. - [Release notes](https://github.com/rust-lang/libc/releases) - [Changelog](https://github.com/rust-lang/libc/blob/0.2.172/CHANGELOG.md) - [Commits](rust-lang/libc@0.2.171...0.2.172) --- updated-dependencies: - dependency-name: libc dependency-version: 0.2.172 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Refactor regexp slt tests * handle null test data
… them (apache#15566) * ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them * wip * fix tests * fix * fix * fix doc * fix doc * Improve doc comments of `filter-pushdown-apis` (#22) * Improve doc comments * Apply suggestions from code review --------- Co-authored-by: Adrian Garcia Badaracco <[email protected]> * Apply suggestions from code review Co-authored-by: Andrew Lamb <[email protected]> * simplify according to pr feedback * Add missing file * Add tests * pipe config in * docstrings * Update datafusion/physical-plan/src/filter_pushdown.rs * fix * fix * fmt * fix doc * add example usage of config * fix test * convert exec API and optimizer rule * re-add docs * dbg * dbg 2 * avoid clones * part 3 * fix lint * tests pass * Update filter.rs * update projection tests * update slt files * fix * fix references * improve impls and update tests * apply stop logic * update slt's * update other tests * minor * rename modules to match logical optimizer, tweak docs --------- Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: berkaysynnada <[email protected]> Co-authored-by: Berkay Şahin <[email protected]>
* flatten array in a single step instead of recursive * clippy * update flatten type signature to Array * add fixed list to list coercion to flatten signature * support LargeList(List) and LargeList(FixedSizeList) in flatten * add test for LargeList(FixedSizeList) * handle nulls * uncomment flatten(NULL) test - it already works
…ation (apache#15694) * Enhance short-circuit evaluation for binary expressions - Delay evaluation of the right-hand side (RHS) unless necessary. - Optimize short-circuiting for `Operator::And` and `Operator::Or` by checking LHS alone first. - Introduce `get_short_circuit_result` function to determine short-circuit conditions based on LHS and RHS. - Update tests to cover various short-circuit scenarios for both `AND` and `OR` operations. * refactor: rename test_check_short_circuit to test_get_short_circuit_result and update assertions - Renamed the test function for clarity. - Updated assertions to use get_short_circuit_result instead of check_short_circuit. - Added additional test cases for AND and OR operations with expected results. * fix: enhance short-circuit evaluation logic in get_short_circuit_result function for null - Updated AND and OR short-circuit conditions to only trigger when all values are either false or true, respectively, and there are no nulls in the array. - Adjusted test case to reflect the change in expected output. * feat: add debug logging for binary expression evaluation and short-circuit checks * fix: improve short-circuit evaluation logic in BinaryExpr to ensure RHS is only evaluated when necessary * fix: restrict short-circuit evaluation to logical operators in get_short_circuit_result function * add more println!("==> "); * fix: remove duplicate data type checks for left and right operands in BinaryExpr evaluation * feat: add debug prints for dictionary values and keys in binary expression tests * Tests pass * fix: remove redundant short-circuit evaluation check in BinaryExpr and enhance documentation for get_short_circuit_result * refactor: remove unnecessary debug prints and streamline short-circuit evaluation in BinaryExpr * test: enhance short-circuit evaluation tests for nullable and scalar values in BinaryExpr * add benchmark * refactor: improve short-circuit logic in BinaryExpr for logical operators - Renamed `arg` to `lhs` for clarity in the `get_short_circuit_result` function. - Updated handling of Boolean data types to return `None` for null values. - Simplified short-circuit checks for AND/OR operations by consolidating logic. - Enhanced readability and maintainability of the code by restructuring match statements. * refactor: enhance short-circuit evaluation strategy in BinaryExpr to optimize logical operations * Revert "refactor: enhance short-circuit evaluation strategy in BinaryExpr to optimize logical operations" This reverts commit a62df47. * bench: add benchmark for OR operation with all false values in short-circuit evaluation * refactor: add ShortCircuitStrategy enum to optimize short-circuit evaluation in BinaryExpr - Replaced the lazy evaluation of the right-hand side (RHS) with immediate evaluation based on short-circuiting logic. - Introduced a new function `check_short_circuit` to determine if short-circuiting can be applied for logical operators. - Updated the logic to return early for `Operator::And` and `Operator::Or` based on the evaluation of the left-hand side (LHS) and the conditions of the RHS. - Improved clarity and efficiency of the short-circuit evaluation process by eliminating unnecessary evaluations. * refactor: simplify short-circuit evaluation logic in check_short_circuit function * datafusion_expr::lit as expr_lit * refactor: optimize short-circuit evaluation in check_short_circuit function - Simplified logic for AND/OR operations by prioritizing false/true counts to enhance performance. - Updated documentation to reflect changes in array handling techniques. * refactor: add count_boolean_values helper function and optimize check_short_circuit logic - Introduced a new helper function `count_boolean_values` to count true and false values in a BooleanArray, improving readability and performance. - Updated `check_short_circuit` to utilize the new helper function for counting, reducing redundant operations and enhancing clarity in the evaluation logic for AND/OR operations. - Adjusted comments for better understanding of the short-circuiting conditions based on the new counting mechanism. * Revert "refactor: add count_boolean_values helper function and optimize check_short_circuit logic" This reverts commit e2b9f77. * optimise evaluate * optimise evaluate 2 * refactor op:AND, lhs all false op:OR, lhs all true to be faster * fix clippy warning * refactor: optimize short-circuit evaluation logic in check_short_circuit function * fix clippy warning * add pre selection * add some comments * [WIP] fix pre-selection result * fix: Error in calculating the ratio * fix: Correct typo in pre-selection threshold constant and improve pre-selection scatter function documentation * fix doctest error * fix cargo doc * fix cargo doc * test: Add unit tests for pre_selection_scatter function --------- Co-authored-by: Siew Kam Onn <[email protected]>
* fix: serialize listing table without partition column * remove unwrap * format * clippy
…e#15726) * coerce FixedSizeBinary to Binary * simplify FixedSizeBytes equality to literal * fix clippy * remove redundant ExprSimplifier case * Add explain test to make sure unwrapping is working correctly --------- Co-authored-by: Andrew Lamb <[email protected]>
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.35 to 4.5.36. - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](clap-rs/clap@clap_complete-v4.5.35...clap_complete-v4.5.36) --- updated-dependencies: - dependency-name: clap dependency-version: 4.5.36 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add DataFusion 47.0.0 Upgrade Guide * prettier * Update docs/source/library-user-guide/upgrading.md Co-authored-by: Oleks V <[email protected]> * Update docs/source/library-user-guide/upgrading.md Co-authored-by: Oleks V <[email protected]> * Fix examples * Try and fix tests again --------- Co-authored-by: Oleks V <[email protected]>
* Support Accumulator for avg duration * Add tests
* Improve simplify_expressions rule * address comments * address comments
* doc:Add documentation for OPTIONS clause syntax * doc:rename write_options.md to format_options.md and clarify its scope for both reading and writing * doc: change dml.md, cuz still have wrong write_options filename * doc: update doctest reference to renamed format_options.md * docs: update and correct format options documentation * doc: add more information of options content * remove execution settings, move note about insert * wordsmith example --------- Co-authored-by: Andrew Lamb <[email protected]>
* fix: parquet coerce_int96 schema * move test to parquet.slt * update based on comphead's suggestion
…age (apache#15644) * Show current SQL recursion limit in RecursionLimitExceeded error message * use recursion_limit setting from sql-parser-options * resolve merge conflicts * move error handling code to helper method
Bumps [sqllogictest](https://github.com/risinglightdb/sqllogictest-rs) from 0.28.0 to 0.28.1. - [Release notes](https://github.com/risinglightdb/sqllogictest-rs/releases) - [Changelog](https://github.com/risinglightdb/sqllogictest-rs/blob/main/CHANGELOG.md) - [Commits](risinglightdb/sqllogictest-rs@v0.28.0...v0.28.1) --- updated-dependencies: - dependency-name: sqllogictest dependency-version: 0.28.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* infer placeholder datatype for IN lists * infer placeholder datatype for Expr::Like * add tests for Expr::SimilarTo --------- Co-authored-by: Kevin <[email protected].>
Fixed issue in the Avro reader that caused queries to fail when columns were reordered in the SELECT statement. The reader now correctly: 1. Builds arrays in the order specified in the projection 2. Creates a properly ordered schema matching the projection Previously when selecting columns in a different order than the original schema (e.g., `SELECT timestamp, username FROM avro_table`), the reader would produce error due to type mismatches between the data arrays and the expected schema. Fixes apache#15839
…pache#15901) Co-authored-by: Andrew Lamb <[email protected]>
* feat: add union_tag scalar function * update for new api * Add test for second field type --------- Co-authored-by: Andrew Lamb <[email protected]>
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.44.1 to 1.44.2. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](tokio-rs/tokio@tokio-1.44.1...tokio-1.44.2) --- updated-dependencies: - dependency-name: tokio dependency-version: 1.44.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: xudong.w <[email protected]>
Bumps [assert_cmd](https://github.com/assert-rs/assert_cmd) from 2.0.16 to 2.0.17. - [Changelog](https://github.com/assert-rs/assert_cmd/blob/master/CHANGELOG.md) - [Commits](assert-rs/assert_cmd@v2.0.16...v2.0.17) --- updated-dependencies: - dependency-name: assert_cmd dependency-version: 2.0.17 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Factor out Substrait consumers into separate files * Move relations and expressions into their own modules * Refactor: rename rex to expr * Refactor: move from_substrait_extended_expr to mod.rs --------- Co-authored-by: Andrew Lamb <[email protected]>
* add table column alias for unnest projection * fix clippy * fix columns check
* feat: Add datafusion-spark crate * spark crate setup * clean up 2 example functions * cleanup crate * Spark crate setup * fix lint issue * cargo cleanup * fix collision in sqllogic * remove redundant test * test float precision when casting to string * reorder * undo * save * save * save * add spark crate * remove spark from core * add comment to import tests * Fix: reset submodule to main pointer and clean state * Save * fix registration * modify float64 precision for spark * Update datafusion/spark/src/lib.rs Co-authored-by: Andrew Lamb <[email protected]> * clean up code * code cleanup --------- Co-authored-by: Andrew Lamb <[email protected]>
- Fix typo in introduction.md - Remove period from end of bullet point to maintain consistency with other bullet points
* migrate tests in `push_down_filters.rs` to use snapshot assertions * remove unused format checks * Revert "remove unused format checks" This reverts commit dc4f137. * migrate `assert_eq!` in `push_down_filters.rs` to use snapshot assertions * migrate `assert_eq!` in `push_down_filters.rs` to use snapshot assertions --------- Co-authored-by: Dmitrii Blaginin <[email protected]>
* Add `FormatOptions` to Config * Fix `output_with_header` * Add cli test * Add `to_string` * Prettify * Prettify * Preserve the initial `NULL` logic * Cleanup * Remove `lt` as no longer needed * Format assert * Fix sqllogictest * Fix tests * Set formatting params for dates / times * Lowercase `duration_format` --------- Co-authored-by: Andrew Lamb <[email protected]>
* docs: Label �loom_filter_on_read as a reading config * fix: Update configs.md
…zation (apache#15936) * add query to show improvement for 15591. * document the new added query.
Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.14 to 0.7.15. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](tokio-rs/tokio@tokio-util-0.7.14...tokio-util-0.7.15) --- updated-dependencies: - dependency-name: tokio-util dependency-version: 0.7.15 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: xudong.w <[email protected]>
* migrate `assert_eq!` in `optimize_projection/mod.rs` to use snapshot assertions * migrate `assert_optimized_plan_equal!` in `propagate_empty_relations.rs` to use snapshot assertions * remove all `assert_optimized_plan_eq` * migrate `assert_optimized_plan_equal!` in `decorrelate_predicate_subquery.rs` to use snapshot assertions * Add snapshot assertion macro for optimized plan equality checks --------- Co-authored-by: Dmitrii Blaginin <[email protected]>
…ta columns (apache#15935) * fix query results for predicates referencing partition columns and data columns * fmt * add e2e test * newline
Bumps [substrait](https://github.com/substrait-io/substrait-rs) from 0.55.0 to 0.55.1. - [Release notes](https://github.com/substrait-io/substrait-rs/releases) - [Changelog](https://github.com/substrait-io/substrait-rs/blob/main/CHANGELOG.md) - [Commits](substrait-io/substrait-rs@v0.55.0...v0.55.1) --- updated-dependencies: - dependency-name: substrait dependency-version: 0.55.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* feat: create helpers to set the max_temp_directory_size Signed-off-by: Jérémie Drouet <[email protected]> * refactor: use helper in cli Signed-off-by: Jérémie Drouet <[email protected]> * refactor: update error message Signed-off-by: Jérémie Drouet <[email protected]> * refactor: use setter in tests Signed-off-by: Jérémie Drouet <[email protected]> --------- Signed-off-by: Jérémie Drouet <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
* refactor filter pushdown apis * remove commented out code * fix tests * fail to fix bug * fix * add/fix docs * lint * add some docstrings, some minimal cleaup * review suggestions * add more comments * fix doc links * fmt * add comments * make test deterministic * add bench * fix bench * register bench * fix bench * cargo fmt --------- Co-authored-by: berkaysynnada <[email protected]> Co-authored-by: Berkay Şahin <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
See: apache#15563
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?