-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add option to set header row #453
Conversation
Yes, the same. See first four tests https://github.com/dimastbk/python-calamine/blob/master/tests/test_base.py#L11 |
src/xlsx/mod.rs
Outdated
#[derive(Debug, Default)] | ||
pub struct XlsxOptions { | ||
/// By default, calamine skips empty rows until a nonempty row is found | ||
pub keep_first_empty_rows: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a more generic approach would be to have the option to set header row?
By default None
means that it'll skip all empty rows (like today) and then we could have with_header_row(self, row: u32)
?
I understand that >90% of use cases are the first row but it still feels better and not necessarily more complicated to give the option to the users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect will do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -1786,3 +1787,39 @@ fn test_ref_xlsb() { | |||
] | |||
); | |||
} | |||
|
|||
#[rstest] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't know about rstest. It seems neat.
EDIT: SORRY for the late answer!!
I don't see anything wrong here. I'd like the comments to reflect where/how it applies if possible, in particular if there is no option for other files.
While I'd love all files to be supported equally, I understand people focusing on their own issues first. If you don't mind trying to check other files i'd be great, else it is ok. Thanks anyway for your time!
I don't really know too. I suppose it could be something useful in more contexts but I haven't felt a strong need either. |
I'll try to tackle other file formats by the end of the week |
f4c3bd8
to
dba5b60
Compare
@tafia I added the support for ODS. I reckon having the option at workbook level is not appropriate. We could want to open a sheet with one header row and another sheet with another header row. /// Options for reading data
pub struct ReadDataOptions {
/// header row
pub header_row: Option<u32>,
...
}
pub trait Reader<RS>: Sized
where
RS: Read + Seek,
{
...
fn worksheet_range(&mut self, name: &str, options: ReadDataOptions) -> Result<Range<Data>, Self::Error>;
...
fn worksheet_range_at(&mut self, n: usize, options: ReadDataOptions) -> Option<Result<Range<Data>, Self::Error>> {
let name = self.sheet_names().get(n)?.to_string();
Some(self.worksheet_range(&name, options))
}
...
}
pub trait ReaderRef<RS>: Reader<RS>
where
RS: Read + Seek,
{
fn worksheet_range_ref<'a>(&'a mut self, name: &str, options: ReadDataOptions)
-> Result<Range<DataRef<'a>>, Self::Error>;
fn worksheet_range_at_ref(&mut self, n: usize, options: ReadDataOptions) -> Option<Result<Range<DataRef>, Self::Error>> {
let name = self.sheet_names().get(n)?.to_string();
Some(self.worksheet_range_ref(&name, options))
}
} The issue is that it would break the API |
@PrettyWood would your solution work for this use case? |
@ACronje yup! |
Even though my Rust knowledge is pretty limited I would be willing to help get this over the line @PrettyWood @tafia . We really need this! |
The rust part is not really a problem and I can finish the job easily, it's more a problem of API ;) |
@PrettyWood since adding options like you've suggested above would be a breaking change, and since adding a new method like I think this could be useful for the other file types that do not support lazy iteration (they are loading the sheets into memory at instantiation time I believe? see: ods) and therefore might decide to use this sheet-specific config to, for example, be more efficient with loading sheet data (e.g. only keeping in memory the sheets defined in the config) |
I first went with the |
I understand what you're saying and I saw your implementation. I explained why I think it's acceptable for the sheet options to be at the workbook level. |
Thanks for sharing your opinion. I'll finish implementing all the extensions and let the EDIT: it is now implemented for all extensions. I went for an associated type if we want to have that at reader level. I still reckon we should have options at workbook level and options at sheet level. And |
Huge thanks to @PrettyWood for this improvement! Our team depends on this PR to fully adopt calamine. @tafia Even though you've already approved the PR, I believe @PrettyWood is still waiting for you to take another look at the post-approval changes before merging. Any chance you could take a look in the coming days? Or can he go ahead and merge? Thanks a lot! |
Thanks a lot for the extra effort! LGTM, just one tiny refactor I believe makes the intent clearer. |
d481deb
to
3a5543d
Compare
3a5543d
to
81adf58
Compare
81adf58
to
72e74b0
Compare
a7e41ef
to
625fb73
Compare
625fb73
to
d56c7f9
Compare
@PrettyWood let me know once you think you're done. I'm happy to merge it as it is already. |
Ok @tafia sorry for those last changes but I prefer switching to an enum. It's more explicit and I plan on adding If that's ok for you this PR is ready to merge on my end |
Awesome thanks! |
We have an issue ToucanToco/fastexcel#209 that has been getting some popularity
People want to be able to set the "header row" knowing where it starts and the fact that calamine skips by default empty rows at the beginning is not always wanted.
I suggest we add an option to be able to change the default behavior directly in calamine. It will be a bit less performant for the ones who enable this option since the whole vec of cells might be shifted but has no impact on people who keep the default behavior.
Note: I decided to tackle only xlsx files for now because
type Option
in theReader
traitrelated to #147 (comment)