-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Air Traffic Control (ATC) corpora #1061
Air Traffic Control (ATC) corpora #1061
Conversation
lhotse/bin/modes/recipes/__init__.py
Outdated
@@ -63,6 +64,7 @@ | |||
from .timit import * | |||
from .vctk import * | |||
from .voxceleb import * | |||
from .uwb_atcc import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should go above vctk
lhotse/recipes/atcosim.py
Outdated
) | ||
recording = Recording.from_file(wav_path, recording_id=row.recording_id) | ||
segment = SupervisionSegment( | ||
id="atcosim_%s_%06d_%06d" % (row.filename, 0, row.length_sec * 100), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally use f-strings for these since they are more readable, but it's your call.
lhotse/recipes/uwb_atcc.py
Outdated
from lhotse.utils import Pathlike, is_module_available, resumable_download | ||
|
||
|
||
def safe_extract_rar( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this to lhotse/utils.py
(since we also has safe_extract()
there)?
Please also fix the style issues. |
Thanks for the review! Have two questions on supervision text normalization:
|
Usually the normalization is recipe-specific --- we don't enforce any particular normalization standards. If there is an existing Kaldi or ESPNet (or other popular toolkit) recipe for the data, we encourage the recipe writers to provide normalization option to be similar to those recipes, so that results are comparable. You can find such methods in Regarding your other question, usually you would just write them out separately in the transcript, e.g., "A B C" and most modern end-to-end ASR models should be able to learn that individually spelled out characters correspond to such sounds. |
Just for your interest-- |
7241876
to
ae21640
Compare
Another question - Any special reason why many recipes round supervision segment duration to ndigits=8? I am currently not using rounding in my ATC recipes. I observe that when iterating over the datasets using K2SpeechRecognitionDataset, for some batches, the batch input features T dimension (num_frames) does not match the batch supervisions num_frames max value. For these bad batches, the batch input T dimension is always 1 frame larger than the maximum value of MonoCut.features.num_frames of the batch. |
Because duration is often computed as
I don't think rounding would contribute to that. The issue you are seeing may be related to some padding/collation edge cases, we could try to debug it with more info. |
@rouseabout @desh2608 Is this ready to merge? If yes, LGTM :) |
It's good to go from my side. |
No description provided.