Skip to content

Conversation

@parthchandra
Copy link
Contributor

This is a followup with minor fixes/additions for the vector io based file reader

Jira

  • PARQUET-2171 : support hadoop vector io

Tests

  • Existing tests are sufficient

Documentation

Existing documentation is sufficient

@parthchandra
Copy link
Contributor Author

@wgtmac, @steveloughran Some minor additions to the vector io based file reader. Adds the read metrics added in the serial reader path. Also adds the default construction in read options to read the hadoop conf for the vector io setting.
Please take a look.

@wgtmac wgtmac merged commit 337d082 into apache:master Apr 29, 2024
@parthchandra
Copy link
Contributor Author

Thank you @wgtmac !

@steveloughran
Copy link
Contributor

looks great. If there's another 14.0 RC, will this go in to it?

Note we create lots and lots of IOstatistics, for vector reads we include #of bytes read and discarded along with all the other timings. My WiP to make that accessible via reflection will help, but it'd still need work in parquet to aggregate.
apache/hadoop#6686
you can have all the stats as a piece of JSON if that helps, then parquet lib just has its own copy of the stats class to parse it...

@wgtmac
Copy link
Member

wgtmac commented May 6, 2024

I think this is already included in the 1.14.0 RC0/RC1

clairemcginty pushed a commit to clairemcginty/parquet-mr that referenced this pull request May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants