Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade DataFusion to latest, to include fixes for aggregation #216

Merged
merged 85 commits into from
Nov 9, 2023

Conversation

Dandandan
Copy link
Collaborator

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

mustafasrepo and others added 30 commits October 25, 2023 16:43
* Initial commit

* Address todos

* Update comments

* Simplifications

* Minor simplifications

* Address reviews

* Add TableScan constructor

* Minor changes

* make try_new_with_schema method of Aggregate private

* Use projection try_new instead of try_new_schema

* Simplifications, add comment

* Review changes

* Improve comments

* Move get_wider_type to type_coercion module

* Clean up type coercion file

---------

Co-authored-by: berkaysynnada <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
…#7655)

* merge main

* fixes and cmt

* review comments, tuning parameters, updating docs

* cargo fmt

* reduce default buffer size to 2 and update docs
…e#7821)

* feat: implement read bloom filter support

* test: add unit test for read bloom filter

* Simplify bloom filter application

* test: add unit test for bloom filter with sql `in`

* fix: imrpove bloom filter match express

* fix: add more test for bloom filter

* ci: rollback dependences

* ci: merge main branch

* fix: unit tests for bloom filter

* ci: cargo clippy

* ci: cargo clippy

---------

Co-authored-by: Andrew Lamb <[email protected]>
* fix: don't push down volatile predicates in projection

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <[email protected]>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <[email protected]>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Andrew Lamb <[email protected]>

* add suggestions

* fix

* fix doc

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <[email protected]>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <[email protected]>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <[email protected]>

* Update datafusion/optimizer/src/push_down_filter.rs

Co-authored-by: Jonah Gao <[email protected]>

---------

Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Jonah Gao <[email protected]>
…itional (apache#7745)

* Make parquet an option by adding multiple cfg attributes without significant code changes.

* Extract parquet logic into submodule from execution::context

* Extract parquet logic into submodule from datafusion_core::dataframe

* Extract more logic into submodule from execution::context

* Move tests from execution::context

* Rename submodules
* Initial commit

* Simplifications

* Cleanup imports

* Review

---------

Co-authored-by: Mehmet Ozan Kabak <[email protected]>
…noseconds, add `to_timestamp_nanos` (apache#7844)

* Change input for `to_timestamp` function

* docs

* fix examples

* output `to_timestamp` signature as ns
* Change input for `to_timestamp` function

* docs

* fix examples

* output `to_timestamp` signature as ns

* Fix CI `to_timestamp()` failed

* Update datafusion/expr/src/built_in_function.rs

Co-authored-by: Andrew Lamb <[email protected]>

* fix typo

* fix

---------

Co-authored-by: Andrew Lamb <[email protected]>
* minor: cast the updated value to the data type of target column

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <[email protected]>

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <[email protected]>

* Update datafusion/sqllogictest/test_files/update.slt

Co-authored-by: Alex Huang <[email protected]>

* fix tests

---------

Co-authored-by: Alex Huang <[email protected]>
* Add simple exclude all columns test to sqllogictest

* Add more exclude test cases
…pache#7896)

* support dictionary encoded string columns for partition cols

* remove debug prints

* cargo fmt

* generic dictionary cast and dict encoded test

* updates from review

* force retry checks

* try checks again
* remove array

Signed-off-by: jayzhan211 <[email protected]>

* cleanup others

Signed-off-by: jayzhan211 <[email protected]>

* clippy

Signed-off-by: jayzhan211 <[email protected]>

* cleanup cast

Signed-off-by: jayzhan211 <[email protected]>

* fmt

Signed-off-by: jayzhan211 <[email protected]>

* cleanup cast

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
…trait (apache#7965)

* Lower &mut SessionContext in substrait

* rm mut ctx in tests
* Minor: Improve `HashJoinExec` documentation

* Apply suggestions from code review

Co-authored-by: Liang-Chi Hsieh <[email protected]>

---------

Co-authored-by: Liang-Chi Hsieh <[email protected]>
…he#7970)

* Add README.md to `core`, `execution` and `physical-plan` crates

* prettier

* Update datafusion/physical-plan/README.md

* Update datafusion/wasmtest/README.md

---------

Co-authored-by: Daniël Heres <[email protected]>
…7936)

* Move source repartitioning into ExecutionPlan::repartition

* cleanup

* update test

* update test

* refine docs

* fix merge
* minor: fix broken links in README.md

* fix proto link
* Minor: Upate the sqllogictest crate README

* prettier

* Apply suggestions from code review

Co-authored-by: Jonah Gao <[email protected]>
Co-authored-by: jakevin <[email protected]>

---------

Co-authored-by: Jonah Gao <[email protected]>
Co-authored-by: jakevin <[email protected]>
* Fix try_from_array data type for NULL value in ListArray

* Fix

* Explicitly assert the datatype

* For review
andygrove and others added 19 commits November 6, 2023 07:44
* changelog

* update version

* update changelog
…() (apache#8059)

* deprecate BuiltinScalarFunction::supports_zero_argument()

* unify old supports_zero_argument() impl
* feat: add example to ci

* nit

* addr comments

---------

Co-authored-by: zhongjingxiong <[email protected]>
Updates the requirements on [substrait](https://github.com/substrait-io/substrait-rs) to permit the latest version.
- [Release notes](https://github.com/substrait-io/substrait-rs/releases)
- [Changelog](https://github.com/substrait-io/substrait-rs/blob/main/CHANGELOG.md)
- [Commits](substrait-io/substrait-rs@v0.18.0...v0.19.0)

---
updated-dependencies:
- dependency-name: substrait
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ppy` as well as `foo.parquet` (apache#7972)

* feat: read files based on the file extention

* fix: some the file extension might be started with . and some not

* fix: rename extention to extension

* chore: use exec_err

* chore: rename extention to extension

* chore: rename extention to extension

* chore: simplify the code

* fix: check table is empty

* ci: fix test

* fix: add err info

* refactor: extract the logic to infer_types

* fix: add tests for different extensions

* fix: ci clippy

* fix: add more tests

* fix: simplify the logic

* fix: ci
* Minor: Improve HashJoinStream docstrings

* fix comments

* Update datafusion/physical-plan/src/joins/hash_join.rs

Co-authored-by: comphead <[email protected]>

* Update datafusion/physical-plan/src/joins/hash_join.rs

Co-authored-by: comphead <[email protected]>

---------

Co-authored-by: Daniël Heres <[email protected]>
Co-authored-by: comphead <[email protected]>
* Fixing broken link

* Update docs/source/contributor-guide/index.md

Thanks for spotting this as well

Co-authored-by: Liang-Chi Hsieh <[email protected]>

---------

Co-authored-by: Liang-Chi Hsieh <[email protected]>
* fix: DataFusion suggests invalid functions

* update test

* Add test for BuiltInWindowFunction
* General array repeat

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* add test

Signed-off-by: jayzhan211 <[email protected]>

* add test

Signed-off-by: jayzhan211 <[email protected]>

* done

Signed-off-by: jayzhan211 <[email protected]>

* remove test

Signed-off-by: jayzhan211 <[email protected]>

* add comment

Signed-off-by: jayzhan211 <[email protected]>

* fm

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
… rule (apache#8061)

* Minor: remove unnecessary projection

* fix ci
…parquet dependencies (apache#8095)

* remove duplicate version numbers for arrow, object_store, and parquet dependencies

* cargo update

* use default features in parquet crate

* disable default parquet features in wasmtest
* Protobuf serde for Json file sink

* Fix tests

* Fix test
@Dandandan Dandandan merged commit ca4b6ee into v32 Nov 9, 2023
46 checks passed
@Dandandan Dandandan deleted the upgrade_df_agg branch November 9, 2023 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.