cleanup fix for type mismatch in Parquet predicate pushdown by nishantrayan · Pull Request #11118 · prestodb/presto

nishantrayan · 2018-07-24T01:26:32Z

follow up from #9975 to clean up the logic

nishantrayan · 2018-07-27T17:25:50Z

nezihyigitbasi

I took a quick pass. I will take a detailed look after you address the comments.

nezihyigitbasi · 2018-07-27T18:50:28Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/RichColumnDescriptor.java

It's possible to pass this in the constructor as far as I can see in the code. So, please do so, mark hiveType as final and mark RichColumnDescriptor as Immutable. Also please update the comment in this class.

nezihyigitbasi · 2018-07-27T19:00:57Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetHiveRecordCursor.java

This stream expression is complex, please use a regular foreach loop instead.

nezihyigitbasi · 2018-07-27T19:01:25Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetPageSourceFactory.java

ditto (use foreach to make this easier to read)

nezihyigitbasi · 2018-07-27T19:02:24Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetTypeUtils.java

I think you can change this logic to pass the hive type to the rich column descriptor constructor instead of setting it via a setter. Please see my comment below for RichColumnDescriptor.

nezihyigitbasi · 2018-07-27T19:03:15Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetTypeUtils.java

use instanceof

nezihyigitbasi · 2018-07-27T19:03:45Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetTypeUtils.java

unnecessary else.

nishantrayan · 2018-07-28T01:41:01Z

@nezihyigitbasi made the cleanups requested

nezihyigitbasi

Thanks, I left some more comments and this is getting pretty close.

nezihyigitbasi · 2018-07-28T03:39:07Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetHiveRecordCursor.java

ImmutableList.copyOf(typeColumns.keySet())

nezihyigitbasi · 2018-07-28T03:39:19Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetPageSourceFactory.java

ImmutableList.copyOf(typeColumns.keySet())

nezihyigitbasi · 2018-07-28T03:42:16Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetTypeUtils.java

We can put one parameter per line as this line is long:

public static Map<List<String>, RichColumnDescriptor> getDescriptors( MessageType fileSchema, MessageType requestedSchema, Map<parquet.schema.Type, HiveColumnHandle> typeColumns) {

nezihyigitbasi · 2018-07-28T03:43:43Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetTypeUtils.java

We can simplify this a bit:

Optional<RichColumnDescriptor> richColumnDescriptor = getDescriptor(columns, columnPath, hiveColumnHandle); richColumnDescriptor.ifPresent(descriptor -> descriptorsByPath.put(columnPath, descriptor));

nezihyigitbasi · 2018-07-28T03:45:00Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetTypeUtils.java

Put in on the same line:

HiveType hiveType = hiveColumnHandle == null ? null : hiveColumnHandle.getHiveType(); return Optional.of(new RichColumnDescriptor(columnIO.getColumnDescriptor(), columnIO.getType().asPrimitiveType(), hiveType));

nezihyigitbasi · 2018-07-28T03:45:49Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetTypeUtils.java

remove this else, it's unnecessary.

nezihyigitbasi · 2018-07-28T03:47:29Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/RichColumnDescriptor.java

// RichColumnDescriptor extends ColumnDescriptor and exposes the PrimitiveType and HiveType information.

nezihyigitbasi · 2018-07-28T03:48:58Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/RichColumnDescriptor.java

this.hiveType = requireNonNull(hiveType, "hiveType is null");

It would be nice if you add non-null checks to other fields too (preferably in a separate commit).

i think we need hiveType to be optional looking at couple of places where richColumnDescriptor is initialized https://github.com/prestodb/presto/pull/11118/files#diff-0f09736fc6c9cfc0691776d4365d409bL233

Then please update the constructor to accept Optional< HiveType> and add the non-null check.

nezihyigitbasi · 2018-07-28T03:57:59Z

...hive/src/test/java/com/facebook/presto/hive/parquet/predicate/TestParquetPredicateUtils.java

Map<Type, HiveColumnHandle> typeColumns = ImmutableMap.of(getParquetType(columnHandle, fileSchema, true), columnHandle);

Please also update the changes below.

nishantrayan · 2018-08-04T13:39:30Z

need 👀 @nezihyigitbasi
Also having trouble with passing CI tried to repro locally without any success.

nezihyigitbasi · 2018-08-05T02:24:17Z

@nishantrayan I can reproduce the failures if I run TestFullParquetReader locally. One issue that's causing test failures is in ParquetTypeUtils::getDescriptor() on L139: Optional.of(hiveType). hiveType can be null so that call can throw, it should be Optional.ofNullable(hiveType). Please also rebase.

nezihyigitbasi · 2018-08-06T20:34:21Z

There are test failures.

nishantrayan · 2018-08-07T21:28:35Z

👀 into the failures

nishantrayan · 2018-09-25T01:41:47Z

@nezihyigitbasi I am having trouble looking at the actual failure. the integration test console doesn't give me much info. local tests are passing. anyway I can run / reproduce failure locally.

nishantrayan · 2018-09-25T02:21:18Z

rebased with latest master

nishantrayan · 2018-10-03T20:41:21Z

@nezihyigitbasi wondering if you have some pointers on how to find the actual failure reason and get to the bottom of this.

ryanrupp · 2018-11-28T06:16:26Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetTypeUtils.java

+        if (descriptor.getHiveType().isPresent()) {
+            TypeInfo typeInfo = descriptor.getHiveType().get().getTypeInfo();
+            switch(typeInfo.getTypeName()){
+                case StandardTypes.SMALLINT:


Does tinyint need to be handled here? In #8243 (comment) they mention getting the error Mismatched Domain types: tinyint vs integer.

Also, maybe verify dictionary filtering works for these smaller int types. It looks like min/max filtering would work here but dictionary filtering doesn't check for those types e.g. here although that's probably less common of a use case.

ryanrupp · 2018-11-28T06:38:48Z

presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetHiveRecordCursor.java

-                    .map(column -> getParquetType(column, fileSchema, useParquetColumnNames))
-                    .filter(Objects::nonNull)
-                    .collect(toList());
+            Map<parquet.schema.Type, HiveColumnHandle> typeColumns = new HashMap<>();


The filter nonNull got removed from this but then the copy of this in ParquetPageSourceFactory still does that. Possibly just consolidate this logic into ParquetTypeUtils and call it from both places.

findepi · 2019-03-04T10:06:33Z

@nishantrayan
i changed Parquet predicate pushdown code in trinodb/trino#131 .
I was aware of your PR in this space, but it appeared I can fix this in a simpler way.
This change was released in Presto 302 (about a month ago).
(@nezihyigitbasi is backporting the fix in #12408, so it will be available in this repo as well)

aweisberg · 2019-06-26T18:37:12Z

Seems like Nezih backported the PrestoSQL fix for this? #12408

Closing. Please reopen if I am incorrect.

facebook-github-bot added the CLA Signed label Jul 24, 2018

nishantrayan mentioned this pull request Jul 24, 2018

Fix type mismatch in Parquet predicate pushdown #9975

Closed

nishantrayan force-pushed the predicate_fix_cleanup branch from b447dad to a5a4661 Compare July 24, 2018 03:01

nezihyigitbasi requested changes Jul 27, 2018

View reviewed changes

nishantrayan force-pushed the predicate_fix_cleanup branch from a5a4661 to 9ddd30e Compare July 28, 2018 01:08

nezihyigitbasi reviewed Jul 28, 2018

View reviewed changes

nishantrayan force-pushed the predicate_fix_cleanup branch 2 times, most recently from df137a6 to f345023 Compare July 31, 2018 18:50

nishantrayan force-pushed the predicate_fix_cleanup branch from f345023 to 60f8564 Compare August 5, 2018 23:14

nishantrayan force-pushed the predicate_fix_cleanup branch from 60f8564 to 4e6fc68 Compare September 25, 2018 02:16

nishantrayan force-pushed the predicate_fix_cleanup branch from 4e6fc68 to b2874ef Compare September 25, 2018 19:47

nishantrayan force-pushed the predicate_fix_cleanup branch 2 times, most recently from 191623b to efd3093 Compare October 11, 2018 03:31

cleanup fix for type mismatch in Parquet predicate pushdown

c3646ce

nishantrayan force-pushed the predicate_fix_cleanup branch from efd3093 to c3646ce Compare October 11, 2018 22:53

ryanrupp mentioned this pull request Nov 12, 2018

Add support for DATE predicate pushdown with Parquet via min/max and … #10181

Closed

ryanrupp mentioned this pull request Nov 26, 2018

Mismatched Domain types: date vs integer #11976

Closed

ryanrupp reviewed Nov 28, 2018

View reviewed changes

findepi mentioned this pull request Feb 1, 2019

Fix Parquet predicate pushdown for smallint, tinyint trinodb/trino#131

Merged

puneetjaiswal deleted the predicate_fix_cleanup branch February 8, 2019 22:21

puneetjaiswal restored the predicate_fix_cleanup branch April 8, 2019 17:21

aweisberg closed this Jun 26, 2019

Conversation

nishantrayan commented Jul 24, 2018

Uh oh!

nishantrayan commented Jul 27, 2018

Uh oh!

nezihyigitbasi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nishantrayan commented Jul 28, 2018

Uh oh!

nezihyigitbasi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nishantrayan commented Aug 4, 2018

Uh oh!

nezihyigitbasi commented Aug 5, 2018

Uh oh!

nezihyigitbasi commented Aug 6, 2018

Uh oh!

nishantrayan commented Aug 7, 2018

Uh oh!

nishantrayan commented Sep 25, 2018

Uh oh!

nishantrayan commented Sep 25, 2018

Uh oh!

nishantrayan commented Oct 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

findepi commented Mar 4, 2019

Uh oh!

aweisberg commented Jun 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants