PARQUET-2171: (followup) add read metrics and hadoop conf integration for vector io reader #1330

parthchandra · 2024-04-27T00:54:08Z

This is a followup with minor fixes/additions for the vector io based file reader

Jira

PARQUET-2171 : support hadoop vector io

Tests

Existing tests are sufficient

Documentation

Existing documentation is sufficient

… for vector io reader

parthchandra · 2024-04-27T00:56:26Z

@wgtmac, @steveloughran Some minor additions to the vector io based file reader. Adds the read metrics added in the serial reader path. Also adds the default construction in read options to read the hadoop conf for the vector io setting.
Please take a look.

parthchandra · 2024-04-30T16:26:54Z

Thank you @wgtmac !

steveloughran · 2024-05-02T19:48:52Z

looks great. If there's another 14.0 RC, will this go in to it?

Note we create lots and lots of IOstatistics, for vector reads we include #of bytes read and discarded along with all the other timings. My WiP to make that accessible via reflection will help, but it'd still need work in parquet to aggregate.
apache/hadoop#6686
you can have all the stats as a piece of JSON if that helps, then parquet lib just has its own copy of the stats class to parse it...

wgtmac · 2024-05-06T01:44:35Z

I think this is already included in the 1.14.0 RC0/RC1

… for vector io reader (apache#1330)

PARQUET-2171: (followup) add read metrics and hadoop conf integration…

96c8d84

… for vector io reader

wgtmac approved these changes Apr 27, 2024

View reviewed changes

wgtmac merged commit 337d082 into apache:master Apr 29, 2024

clairemcginty pushed a commit to clairemcginty/parquet-mr that referenced this pull request May 17, 2024

PARQUET-2171: (followup) add read metrics and hadoop conf integration…

208d7ee

… for vector io reader (apache#1330)

asfimport mentioned this pull request Jun 23, 2024

Implement vectored IO in parquet file format #2703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PARQUET-2171: (followup) add read metrics and hadoop conf integration for vector io reader #1330

PARQUET-2171: (followup) add read metrics and hadoop conf integration for vector io reader #1330

parthchandra commented Apr 27, 2024

Uh oh!

parthchandra commented Apr 27, 2024

Uh oh!

parthchandra commented Apr 30, 2024

Uh oh!

steveloughran commented May 2, 2024

Uh oh!

wgtmac commented May 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PARQUET-2171: (followup) add read metrics and hadoop conf integration for vector io reader #1330

PARQUET-2171: (followup) add read metrics and hadoop conf integration for vector io reader #1330

Conversation

parthchandra commented Apr 27, 2024

Jira

Tests

Documentation

Uh oh!

parthchandra commented Apr 27, 2024

Uh oh!

parthchandra commented Apr 30, 2024

Uh oh!

steveloughran commented May 2, 2024

Uh oh!

wgtmac commented May 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants