Skip to content

Conversation

@LouisClt
Copy link
Contributor

@LouisClt LouisClt commented Dec 1, 2022

@github-actions
Copy link

github-actions bot commented Dec 1, 2022

@github-actions
Copy link

github-actions bot commented Dec 1, 2022

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@pitrou
Copy link
Member

pitrou commented Dec 5, 2022

Why not expose the full stripe information? We don't want to create tons of little accessors for each different piece of information.

@LouisClt
Copy link
Contributor Author

LouisClt commented Dec 6, 2022

Yes, why not.
The structure storing the different informations available is located only in the implementation :

struct StripeInformation {
  uint64_t offset;
  uint64_t length;
  uint64_t num_rows;
  uint64_t first_row_of_stripe;
};

The offset is the number of bytes since the beginning of the file, in bytes, the length is the size of the stripe in bytes, num_rows is the number of rows in the stripe, and "first_row_of_strip" is the index of the first row in the stripe (thus the sum of the num_rows of all the preceding stripes).
We could just put the definition of this struct in the header, with comments on the use of each of these members, and return this struct in the getter.
But maybe some members would benefit from a renaming...

@pitrou
Copy link
Member

pitrou commented Dec 6, 2022

We can change it slightly and expose it as:

struct StripeInformation {
  int64_t offset;
  int64_t length;
  int64_t num_rows;
  int64_t first_row_id;
};

(note the use of signed integers to match our API conventions better)

@LouisClt
Copy link
Contributor Author

LouisClt commented Dec 6, 2022

Ok, that seems fine to me. I will make the changes then.
A prototype like this would be OK ?

StripeInformation GetStripeInformation(int64_t stripe);

@pitrou
Copy link
Member

pitrou commented Dec 6, 2022

Yes, that sounds fine.

@LouisClt
Copy link
Contributor Author

LouisClt commented Dec 9, 2022

So this seems to be good now. I added the "StripeInformation" class in the header, and modified the implementation accordingly.
I added one more small test on that.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you @LouisClt

@pitrou pitrou changed the title ARROW-18421: [C++][ORC] Add accessor for number of rows by stripe in reader ARROW-18421: [C++][ORC] Add accessor for stripe information in reader Dec 12, 2022
@pitrou pitrou merged commit 3bff2c9 into apache:master Dec 12, 2022
@LouisClt LouisClt deleted the AddNumberOfRowsByStripe branch December 12, 2022 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants