Conversation
|
When I see length, I think in seconds. I like the frames approach too, and I'd like it spelled out explicitly (num_frames or whatever). I'd like to see:
Would love to hear other feedback as well while I read into it a bit more. |
|
Added seconds and sampling rates |
isaac-chung
left a comment
There was a problem hiding this comment.
Sorry for adding more. Revisited some papers and maybe we should use the standard measure of audio dataset size.
| unique_audios: int | ||
|
|
||
| average_sampling_rate: float | ||
| sampling_rates: dict[int, int] |
There was a problem hiding this comment.
Could this just be a unique set of sampling rates? OK either way.
| sampling_rates: dict[int, int] | |
| sampling_rates: list[int] |
There was a problem hiding this comment.
I think it's better to keep dict to show full distribution of different sample rates. If this became a problem, we can easily change to list of ints
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
|
@isaac-chung Can you review this PR? |
# Conflicts: # pyproject.toml # uv.lock
There was a problem hiding this comment.
Judging from the size, does this incorporate changes from #3875 too?
There was a problem hiding this comment.
Yes, I used these changes here too

Ref #3498
I’ve started integrating audio statistics. For now, I’ve come up with this format. Do you have any suggestions?