forked from apache/parquet-java
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PARQUET-84: Avoid reading rowgroup metadata in memory on the client s…
…ide. This will improve reading big datasets with a large schema (thousands of columns) Instead rowgroup metadata can be read in the tasks where each tasks reads only the metadata of the file it's reading Author: julien <julien@twitter.com> Closes apache#45 from julienledem/skip_reading_row_groups and squashes the following commits: ccdd08c [julien] fix parquet-hive 24a2050 [julien] Merge branch 'master' into skip_reading_row_groups 3d7e35a [julien] adress review feedback 5b6bd1b [julien] more tests 323d254 [julien] sdd unit tests f599259 [julien] review feedback fb11f02 [julien] fix backward compatibility check 2c20b46 [julien] cleanup readFooters methods 3da37d8 [julien] fix read summary ab95a45 [julien] cleanup 4d16df3 [julien] implement task side metadata 9bb8059 [julien] first stab at integrating skipping row groups
- Loading branch information
1 parent
faf4239
commit 29cca34
Showing
22 changed files
with
1,361 additions
and
724 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.