-
Notifications
You must be signed in to change notification settings - Fork 270
IGNORE: chore: Merge comet-parquet-exec into (just to see diff) #1296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
andygrove
wants to merge
60
commits into
apache:main
from
andygrove:comet-parquet-exec-merge-20250114
Closed
IGNORE: chore: Merge comet-parquet-exec into (just to see diff) #1296
andygrove
wants to merge
60
commits into
apache:main
from
andygrove:comet-parquet-exec-merge-20250114
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
add partial support for multiple parquet files
"filter with string" test now passes
* wip - CometNativeScan * fix and make config internal
This reverts commit 38e32f7.
…e debug logging (apache#1080) * update tests, remove some debug logging * update tests, remove some debug logging * update tests, remove some debug logging * remove unused import
…che#1081) * I think serde works. Gonna try removing the old stuff. * Fixes after merging in upstream. * Remove previous file_config logic. Clippy. * Temporary assertion for testing. * Remove old path proto value. * Selectively generate projection vector.
…stead of FileScanRDD (apache#1088) * DataSourceRDD handling (seems to be related to prefetching, so maybe not relevant for our ParquetExec). * Refactor to reduce duplicate code.
…e#1094) * init * more * fix * more * more * fix
…pache#1106) * init * more * more * fix clippy * Use Spark and Arrow types for partition schema
* fix: Use RDD partition index (apache#1112) * fix: Use RDD partition index * fix * fix * fix * fix style
…e#1138) * WIP: (POC2) A Parquet reader that uses the arrow-rs Parquet reader directly * Change default config --------- Co-authored-by: Parth Chandra <[email protected]>
…rquet (apache#1075) * implement basic native code for casting struct to struct * add another test * rustdoc * add scala side * code cleanup * clippy * clippy * add scala test * improve test * simple struct case passes * save progress * copy schema adapter code from DataFusion * more tests pass * save progress * remove debug println * remove debug println
…e#1142) * Serialize original data schema and required schema, generate projection vector on the Java side. * Sending over more schema info like column names and nullability. * Using the new stuff in the proto. About to take the old out. * Remove old logic. * remove errant print. * Serialize original data schema and required schema, generate projection vector on the Java side. * Sending over more schema info like column names and nullability. * Using the new stuff in the proto. About to take the old out. * Remove old logic. * remove errant print. * Remove commented print. format. * Remove commented print. format. * Fix projection_vector to include partition_schema cols correctly. * Rename variable.
…scan is enabled (apache#1230) * Disable DPP in stability tests, update plans for Spark 3.4 * update plans for Spark 3.5 * fix scan name * fix scan name * fix scan name * Revert a change
…pache#1237) * fix regression in DisableAQECometShuffleSuite * update comments * address feedback * typo
…f Cast expression. (apache#1229) * Copy cast.rs logic to a new parquet_support.rs. Remove Parquet dependencies on cast.rs. * Move parquet_support and schema_adapter to parquet folder. * Add fields to SparkParquetOptions.
…ive_comet (apache#1265) * fix: fix tests failing in native_recordbatch but not in native_full * fix: use session timestamp in native scans * Revert "fix: use session timestamp in native scans" This reverts commit e601deb472037338a36300992434a987bdb026e8. * Revert Change to native record batch timezone * Change stability plans to match original scan. * fix after rebase * Update plans; generate distinct plans for full native scan * generate plans for native_recordbatch * In struct tests, check Comet operator only for scan types that support complex types * Revert "Revert Change to native record batch timezone" This reverts commit 4a147f3. * Reapply "fix: use session timestamp in native scans" This reverts commit 370f901. * Fix previous commit * Rename configs and default scan impl to 'native_comet' * add missing change * fix build * update plans for spark 3.5 * Add new plans for spark 3.5 * Update plans for Spark 4.0 * Plans updated from Spark 4
…eberg_compat scans (apache#1279)
Member
Author
|
One test failure: |
Member
Author
|
Multiple tests failing with same error. Here is one example: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
This is a fork of the branch created by @parthchandra that merged main into
comet-parquet-execand it also fixes a regression where a merge conflict had resulted in reverting to use DataFusion'sFilterExecrather than Comet's forked version which fixed a safety issue around buffer re-use.Rationale for this change
We want to get these changes into main because it is very time consuming to keep rebasing this work.
What changes are included in this PR?
How are these changes tested?