cleanup fix for type mismatch in Parquet predicate pushdown#11118
cleanup fix for type mismatch in Parquet predicate pushdown#11118nishantrayan wants to merge 1 commit intoprestodb:masterfrom
Conversation
b447dad to
a5a4661
Compare
nezihyigitbasi
left a comment
There was a problem hiding this comment.
I took a quick pass. I will take a detailed look after you address the comments.
There was a problem hiding this comment.
It's possible to pass this in the constructor as far as I can see in the code. So, please do so, mark hiveType as final and mark RichColumnDescriptor as Immutable. Also please update the comment in this class.
There was a problem hiding this comment.
This stream expression is complex, please use a regular foreach loop instead.
There was a problem hiding this comment.
ditto (use foreach to make this easier to read)
There was a problem hiding this comment.
I think you can change this logic to pass the hive type to the rich column descriptor constructor instead of setting it via a setter. Please see my comment below for RichColumnDescriptor.
a5a4661 to
9ddd30e
Compare
|
@nezihyigitbasi made the cleanups requested |
nezihyigitbasi
left a comment
There was a problem hiding this comment.
Thanks, I left some more comments and this is getting pretty close.
There was a problem hiding this comment.
ImmutableList.copyOf(typeColumns.keySet())
There was a problem hiding this comment.
ImmutableList.copyOf(typeColumns.keySet())
There was a problem hiding this comment.
We can put one parameter per line as this line is long:
public static Map<List<String>, RichColumnDescriptor> getDescriptors(
MessageType fileSchema,
MessageType requestedSchema,
Map<parquet.schema.Type, HiveColumnHandle> typeColumns)
{
There was a problem hiding this comment.
We can simplify this a bit:
Optional<RichColumnDescriptor> richColumnDescriptor = getDescriptor(columns, columnPath, hiveColumnHandle);
richColumnDescriptor.ifPresent(descriptor -> descriptorsByPath.put(columnPath, descriptor));
There was a problem hiding this comment.
Put in on the same line:
HiveType hiveType = hiveColumnHandle == null ? null : hiveColumnHandle.getHiveType();
return Optional.of(new RichColumnDescriptor(columnIO.getColumnDescriptor(), columnIO.getType().asPrimitiveType(), hiveType));
There was a problem hiding this comment.
remove this else, it's unnecessary.
There was a problem hiding this comment.
// RichColumnDescriptor extends ColumnDescriptor and exposes the PrimitiveType and HiveType information.
There was a problem hiding this comment.
this.hiveType = requireNonNull(hiveType, "hiveType is null");
It would be nice if you add non-null checks to other fields too (preferably in a separate commit).
There was a problem hiding this comment.
i think we need hiveType to be optional looking at couple of places where richColumnDescriptor is initialized https://github.com/prestodb/presto/pull/11118/files#diff-0f09736fc6c9cfc0691776d4365d409bL233
There was a problem hiding this comment.
Then please update the constructor to accept Optional< HiveType> and add the non-null check.
There was a problem hiding this comment.
Map<Type, HiveColumnHandle> typeColumns = ImmutableMap.of(getParquetType(columnHandle, fileSchema, true), columnHandle);
Please also update the changes below.
df137a6 to
f345023
Compare
|
need 👀 @nezihyigitbasi |
|
@nishantrayan I can reproduce the failures if I run |
f345023 to
60f8564
Compare
|
There are test failures. |
|
👀 into the failures |
|
@nezihyigitbasi I am having trouble looking at the actual failure. the integration test console doesn't give me much info. local tests are passing. anyway I can run / reproduce failure locally. |
60f8564 to
4e6fc68
Compare
|
rebased with latest master |
4e6fc68 to
b2874ef
Compare
|
@nezihyigitbasi wondering if you have some pointers on how to find the actual failure reason and get to the bottom of this. |
191623b to
efd3093
Compare
efd3093 to
c3646ce
Compare
| if (descriptor.getHiveType().isPresent()) { | ||
| TypeInfo typeInfo = descriptor.getHiveType().get().getTypeInfo(); | ||
| switch(typeInfo.getTypeName()){ | ||
| case StandardTypes.SMALLINT: |
There was a problem hiding this comment.
Does tinyint need to be handled here? In #8243 (comment) they mention getting the error Mismatched Domain types: tinyint vs integer.
Also, maybe verify dictionary filtering works for these smaller int types. It looks like min/max filtering would work here but dictionary filtering doesn't check for those types e.g. here although that's probably less common of a use case.
| .map(column -> getParquetType(column, fileSchema, useParquetColumnNames)) | ||
| .filter(Objects::nonNull) | ||
| .collect(toList()); | ||
| Map<parquet.schema.Type, HiveColumnHandle> typeColumns = new HashMap<>(); |
There was a problem hiding this comment.
The filter nonNull got removed from this but then the copy of this in ParquetPageSourceFactory still does that. Possibly just consolidate this logic into ParquetTypeUtils and call it from both places.
|
@nishantrayan |
|
Seems like Nezih backported the PrestoSQL fix for this? #12408 Closing. Please reopen if I am incorrect. |
follow up from #9975 to clean up the logic