Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usability of finding a header row and skipping rows #147

Open
TimoFreiberg opened this issue Mar 31, 2019 · 4 comments
Open

Usability of finding a header row and skipping rows #147

TimoFreiberg opened this issue Mar 31, 2019 · 4 comments

Comments

@TimoFreiberg
Copy link

TimoFreiberg commented Mar 31, 2019

I have to read excel sheets with headers where the header row is not the first row.
I currently try to solve this by first using range.rows().enumerate().find(...) to get a row that contains the header names I expect and then using the index of that row to get a subrange of my original range.

I introduced a bug by using the index I got from that process as the new start and therefore started at e.g. row 3 instead of 26. I fixed that by adding the found header index to the current range start.
I guess this is kind of hard to use :/

@TimoFreiberg TimoFreiberg changed the title Range.rows() skips nonempty rows, no way to get absolute index of a row Usability of finding a header row and skipping rows Mar 31, 2019
@tafia
Copy link
Owner

tafia commented Apr 8, 2019

Sorry I probably do not understand your error.

Are you using Range::range method?
Are you then deserializing with this subrange?

@TimoFreiberg
Copy link
Author

Yeah. This is about usability/ergonomics.

Since my header row wasn't the first row, I had to first find the correct row and then skip n rows, where n was the index of the header row when iterating over Range.rows().enumerate().

Having to find the header row manually was a bit annoying, but I'm not sure how the library would offer this functionality in a natural way.

Then I had to skip n rows, but I didn't notice that only nonempty rows would be returned by Range.rows(), so I called my_range.range((n, 0), my_range.end().unwrap()), although instead of n I should have used my_range.start().unwrap().0 + n (i.e. skip n rows).

I might not have made the second mistake if Range had a skipRows(self, n: u32) -> Range method because I wouldn't have noticed the error in my mental model of the Rows iterator.
I also wouldn't have had to write this helper ;)

fn skip_rows(range: Range<DataType>, n: u32) -> Result<Range<DataType>, Error> {
    if range.is_empty() {
        return Err(Error::new("Range was empty"));
    }
    let start = range.start().unwrap();
    let end = range.end().unwrap();
    Ok(range.range((start.0 + n, start.1), end))
}

@tafia
Copy link
Owner

tafia commented Apr 11, 2019

but I didn't notice that only nonempty rows would be returned by Range.rows().

It is not the case. Do you have an example?

I think what you're confused about is probably relative (usize)/absolute (u32) position. Range::range uses absolute index so indeed you should build that from my_range.start.

What we could do is probably improve doc first. Then we could have a new Index<(Range<usize>, Range<usize>)> impl which would then use relative positions.

@TimoFreiberg
Copy link
Author

TimoFreiberg commented May 15, 2019

Sorry for the long silence

It is not the case. Do you have an example?

I was mistaken, it seems like Range.rows() skips empty rows until a nonempty row is found but then also returns empty rows.

Anyway, I wrote helper functions as in https://gist.github.com/TimoFreiberg/3cfc0163d5b1e5b3e4fab11f948b5299 and I thought maybe something like the skip_rows function could make sense as a Range method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants