Add strict argument to audformat.utils.hash() by hagenw · Pull Request #454 · audeering/audformat

hagenw · 2024-07-15T10:11:51Z

Closes #447

Adds the strict argument to audformat.utils.hash().
If True it will take the order of rows and the names of index levels and columns into account. The returned hash is then identical to the one attached to parquet table files.

A hashing variant that takes into account the order of rows, name of columns and levels, and the data types was needed for calculating a hash for parquet tables as introduced in #419. Before, this was handled by a private method, independent of audformat.utils.hash().

codecov · 2024-07-15T10:15:20Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.0%. Comparing base (a8cd511) to head (7514c60).

Additional details and impacted files

Files	Coverage Δ
audformat/core/table.py	`100.0% <100.0%> (ø)`
audformat/core/utils.py	`100.0% <100.0%> (ø)`

hagenw · 2024-07-15T10:18:18Z

include_order_and_names is verbose, but also slightly long as an argument name. strict might be a better choice?

ChristianGeng

Apiwise I think that having only one utils.hash as before and distinguishing calculations based on the include_order_and_names kwarg is nice.

I cannot do a technical review of the md5 algorithm though, so I am not attempting.
The test coverage is fairly extensive. This will help a lot in the discovery of implementation changes on the pandas side, or even in the unlikely case that sth changes in hashlib.md5.

I would not be passionate about the name of the keyword argument, be it "strict" or "include_order_and_names" as long as the motivation becomes clear. The code and the issue description contain already links to the pandas issue where this was discussed. I wonder whether links should even be extended to point to the audformat issues that led to the implementation as is (the pyarrow metdatata has, effectively clarifying why this was needed)?

Apart from that I think that this is good and can be approved.

hagenw · 2024-07-17T08:16:56Z

I added a reference to #419 in the description and renamed the argument to strict, thanks for the suggestions.

hagenw added 2 commits July 15, 2024 12:08

Add include_order_and_names argument to hash()

7245d06

Extend example

ad3f5b3

hagenw requested a review from ChristianGeng July 15, 2024 10:15

ChristianGeng approved these changes Jul 16, 2024

View reviewed changes

Rename argument to strict

7514c60

hagenw changed the title ~~Add include_order_and_names argument to hash()~~ Add strict argument to audformat.utils.hash() Jul 17, 2024

hagenw merged commit 79d6246 into main Jul 17, 2024

hagenw deleted the hash-argument branch July 17, 2024 08:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add strict argument to audformat.utils.hash()#454

Add strict argument to audformat.utils.hash()#454
hagenw merged 3 commits intomainfrom
hash-argument

hagenw commented Jul 15, 2024 •

edited

Loading

Uh oh!

codecov bot commented Jul 15, 2024 •

edited

Loading

Uh oh!

hagenw commented Jul 15, 2024

Uh oh!

ChristianGeng left a comment

Uh oh!

hagenw commented Jul 17, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hagenw commented Jul 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hagenw commented Jul 15, 2024

Uh oh!

ChristianGeng left a comment

Choose a reason for hiding this comment

Uh oh!

hagenw commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hagenw commented Jul 15, 2024 •

edited

Loading

codecov bot commented Jul 15, 2024 •

edited

Loading

hagenw commented Jul 17, 2024 •

edited

Loading