Skip to content

Add pushdown for parquet timestamp predicate#4104

Merged
findepi merged 1 commit intotrinodb:masterfrom
alexjo2144:parquet-timestamp-pushdown
Jun 24, 2020
Merged

Add pushdown for parquet timestamp predicate#4104
findepi merged 1 commit intotrinodb:masterfrom
alexjo2144:parquet-timestamp-pushdown

Conversation

@alexjo2144
Copy link
Copy Markdown
Member

No description provided.

@cla-bot cla-bot bot added the cla-signed label Jun 19, 2020
Copy link
Copy Markdown
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
editorial comments

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not too long, but would be slightly better to extract QueryInfo fullQueryInfo var

@alexjo2144
Copy link
Copy Markdown
Member Author

Updated, thanks

@alexjo2144 alexjo2144 force-pushed the parquet-timestamp-pushdown branch from 1e45799 to fba7bcc Compare June 23, 2020 16:13
@alexjo2144 alexjo2144 force-pushed the parquet-timestamp-pushdown branch from fba7bcc to 80ee8a5 Compare June 23, 2020 17:59
@findepi findepi merged commit dc1c442 into trinodb:master Jun 24, 2020
@findepi findepi added this to the 337 milestone Jun 24, 2020
@findepi
Copy link
Copy Markdown
Member

findepi commented Jun 24, 2020

Merged, thanks!

@findepi findepi mentioned this pull request Jun 24, 2020
9 tasks
@ryanrupp
Copy link
Copy Markdown
Member

Does this cover int64 logically typed as TIMESTAMP_MICROS OR TIMESTAMP_MILLIS (also timestamp nanos I believe in newer versions of Parquet)? I wasn't familiar with ParquetTimestampUtils which is used here but it seems to reference int96 timestamps which is deprecated for Parquet. Also this is using BinaryStatistics then (probably what you get for int96 encoded timestamps? not familiar).

For reference there was an older PR #1999 that was adding pushdown for timestamps on int64 via LongStatistics

@alexjo2144 alexjo2144 deleted the parquet-timestamp-pushdown branch July 14, 2020 18:10
@alexjo2144
Copy link
Copy Markdown
Member Author

Hey Ryan, thanks for pointing that out. I think you're right, this only works for the legacy int96 timestamps. I didn't realize it was deprecated.

Do you know if Presto's Parquet read/writer supports the new format or should we add an issue to add that in? There's probably some more changed needed to work in the parametric timestamp types that Dain has been working on too.

@ryanrupp
Copy link
Copy Markdown
Member

@alexjo2144 I'm not sure what the writer actually writes out currently (my write path isn't through Presto currently). For the read path I'm pretty sure it can read int64 TIMESTAMP_MILLIS but I'm not sure about int64 logically typed as TIMESTAMP_MICROS. I thought of this because there was a post in Slack asking if TIMESTAMP_MICROS is supported on the read path here

Here's the documentation about timestamp handling in Parquet. One of the newer features is support for nanos precision stored in an int64 (with nano precision in int96 being deprecated, see here)

@alexjo2144
Copy link
Copy Markdown
Member Author

Gotcha. Looks like the default presto parquet writer uses int96, but there's a new writer, added the same day as these changes, that supports int64 at millisecond precision #3400 That should help for testing.

I can work on putting up an updated version of #1999 to tie that in.

pettyjamesm referenced this pull request in pettyjamesm/prestosql Sep 4, 2020
Added in prestosql#4104, predicate pushdown for parquet INT96 timestamp
values can result in incorrect results even when stats appear valid by
checking min <= max. Parquet writers that produced statistics at all
were comparing min and max values as BINARY which is oblivious to the
semantics of how INT96 timestamps are encoded making them unusable.

Comparison of INT96 values for statistics was removed in PARQUET-1065
for all cases except when min == max, which would not be affected by
the comparison issue or other byte order issues that existed with
parquet BINARY types at the time. Any parquet file that contains INT96
statistics where min != max would have to have been written by an older
parquet writer that compared the values incorrectly, making those
statistics unusable.

This change disables parquet predicate pushdown on INT96 timestamps
except for when all rows have the same value (ie: min == max).
martint referenced this pull request Sep 4, 2020
Added in prestosql#4104, predicate pushdown for parquet INT96 timestamp
values can result in incorrect results even when stats appear valid by
checking min <= max. Parquet writers that produced statistics at all
were comparing min and max values as BINARY which is oblivious to the
semantics of how INT96 timestamps are encoded making them unusable.

Comparison of INT96 values for statistics was removed in PARQUET-1065
for all cases except when min == max, which would not be affected by
the comparison issue or other byte order issues that existed with
parquet BINARY types at the time. Any parquet file that contains INT96
statistics where min != max would have to have been written by an older
parquet writer that compared the values incorrectly, making those
statistics unusable.

This change disables parquet predicate pushdown on INT96 timestamps
except for when all rows have the same value (ie: min == max).
@ericxiao251
Copy link
Copy Markdown

This improvement really helped with one of our queries, but I want to understand a bit more of what's actually happening with Presto and taking advantage of the parquet metadata...

What does the getDomain function in the TupleDomainParquetPredicate.java file do? How does leveraging timestamp metadata help when data is randomly stored on disk?

@alexjo2144
Copy link
Copy Markdown
Member Author

So, Parquet files are separated out into RowGroups. Each RowGroup gets an entry in the file footer with statistics information. This section of the code reads that footer and determines which sections of the file can be skipped by comparing the footer statistics with any filters from the query. getDomain returns that statistics information as a Domain that the engine can use for this comparison. Skipping whole groups means being able to reduce IO, and also do less filtering after the data is loaded into the engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants