-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31806][SQL][TESTS] Check reading date/timestamp from legacy parquet: dictionary encoding, w/o Spark version #28630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
ad57a05
Add test
MaxGekk 30dd3b3
Update test gen
MaxGekk 4359c4d
Re-gen parquet files
MaxGekk a101920
Ignore gen test
MaxGekk b07290e
Fix dates
MaxGekk 707e936
Bug fix tests
MaxGekk 16a18a8
Merge remote-tracking branch 'remotes/origin/master' into parquet-fil…
MaxGekk 95e73bc
Fix merge
MaxGekk b0e4a32
Check 2.4 files in read by default
MaxGekk 0add1a2
Add comments
MaxGekk 4419760
Don't set rebase in write in test input generation
MaxGekk 3f2b474
Address Wenchen's review comment
MaxGekk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Binary file removed
BIN
-398 Bytes
sql/core/src/test/resources/test-data/before_1582_date_v2_4.snappy.parquet
Binary file not shown.
Binary file added
BIN
+660 Bytes
sql/core/src/test/resources/test-data/before_1582_date_v2_4_5.snappy.parquet
Binary file not shown.
Binary file added
BIN
+694 Bytes
sql/core/src/test/resources/test-data/before_1582_date_v2_4_6.snappy.parquet
Binary file not shown.
Binary file added
BIN
+737 Bytes
sql/core/src/test/resources/test-data/before_1582_timestamp_int96_dict_v2_4_5.snappy.parquet
Binary file not shown.
Binary file added
BIN
+771 Bytes
sql/core/src/test/resources/test-data/before_1582_timestamp_int96_dict_v2_4_6.snappy.parquet
Binary file not shown.
Binary file added
BIN
+693 Bytes
...core/src/test/resources/test-data/before_1582_timestamp_int96_plain_v2_4_5.snappy.parquet
Binary file not shown.
Binary file added
BIN
+727 Bytes
...core/src/test/resources/test-data/before_1582_timestamp_int96_plain_v2_4_6.snappy.parquet
Binary file not shown.
Binary file removed
BIN
-494 Bytes
sql/core/src/test/resources/test-data/before_1582_timestamp_int96_v2_4.snappy.parquet
Binary file not shown.
Binary file removed
BIN
-436 Bytes
sql/core/src/test/resources/test-data/before_1582_timestamp_micros_v2_4.snappy.parquet
Binary file not shown.
Binary file added
BIN
+767 Bytes
sql/core/src/test/resources/test-data/before_1582_timestamp_micros_v2_4_5.snappy.parquet
Binary file not shown.
Binary file added
BIN
+801 Bytes
sql/core/src/test/resources/test-data/before_1582_timestamp_micros_v2_4_6.snappy.parquet
Binary file not shown.
Binary file removed
BIN
-436 Bytes
sql/core/src/test/resources/test-data/before_1582_timestamp_millis_v2_4.snappy.parquet
Binary file not shown.
Binary file added
BIN
+761 Bytes
sql/core/src/test/resources/test-data/before_1582_timestamp_millis_v2_4_5.snappy.parquet
Binary file not shown.
Binary file added
BIN
+795 Bytes
sql/core/src/test/resources/test-data/before_1582_timestamp_millis_v2_4_6.snappy.parquet
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we have 2 files for plain and dictionary-encoding for int96? other types just have one file and 2 columns.
if it's caused by some parquet limitation, let's write a comment to explain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because INT96 always use dictionary encoding independent from number of values and theirs uniqueness. I have to explicitly turn off dictionary encoding while saving to parquet files, see the test above.
Other types don't have such "problem" - for one column parquet lib uses dict encoding because all values are unique, for another one it applies plain enc because all values in date/timestamp columns are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment