Air Traffic Control (ATC) corpora #1061

rouseabout · 2023-05-16T13:10:03Z

No description provided.

desh2608 · 2023-05-16T13:36:39Z

lhotse/bin/modes/recipes/__init__.py

@@ -63,6 +64,7 @@
 from .timit import *
 from .vctk import *
 from .voxceleb import *
+from .uwb_atcc import *


Should go above vctk

desh2608 · 2023-05-16T13:51:50Z

lhotse/recipes/atcosim.py

+            )
+            recording = Recording.from_file(wav_path, recording_id=row.recording_id)
+            segment = SupervisionSegment(
+                id="atcosim_%s_%06d_%06d" % (row.filename, 0, row.length_sec * 100),


We generally use f-strings for these since they are more readable, but it's your call.

desh2608 · 2023-05-16T13:54:55Z

lhotse/recipes/uwb_atcc.py

+from lhotse.utils import Pathlike, is_module_available, resumable_download
+
+
+def safe_extract_rar(


Could you move this to lhotse/utils.py (since we also has safe_extract() there)?

desh2608 · 2023-05-16T15:05:34Z

Please also fix the style issues.

rouseabout · 2023-05-17T11:49:38Z

Thanks for the review! Have two questions on supervision text normalization:

Should supervision text be upper or lowercase? It seems many recipes are using uppercase.
How best to express individually spelt-out English letters? The current patch use tilde prefixes inspired from the ATCOSIM paper. For example, the supervision text "~a ~b ~c" is spoken as "aye bee see".

desh2608 · 2023-05-17T18:04:45Z

Thanks for the review! Have two questions on supervision text normalization:

Should supervision text be upper or lowercase? It seems many recipes are using uppercase.

How best to express individually spelt-out English letters? The current patch use tilde prefixes inspired from the ATCOSIM paper. For example, the supervision text "~a ~b ~c" is spoken as "aye bee see".

Usually the normalization is recipe-specific --- we don't enforce any particular normalization standards. If there is an existing Kaldi or ESPNet (or other popular toolkit) recipe for the data, we encourage the recipe writers to provide normalization option to be similar to those recipes, so that results are comparable. You can find such methods in lhotse/recipes/utils.py. You can choose to add your normalization method in that script if you think it may be applicable to other corpora as well. For ASR corpora, if orthographic transcription is not required, you can provide upper-casing and punctuation removal as normalization.

Regarding your other question, usually you would just write them out separately in the transcript, e.g., "A B C" and most modern end-to-end ASR models should be able to learn that individually spelled out characters correspond to such sounds.

danpovey · 2023-05-18T05:02:19Z

Just for your interest--
for future systems we have in mind totally un-normalized operation, where we do essentially no normalization at all on the source text except removing things like HTML markup. And we'll use BPE encoding with failover to bytes (or just bytes themselves) so that any UTF-8 sequences can be encoded.
The results of training a system like this on a bunch of librivox data seem, anecdotally, very good-- it outputs good quality punctuation.
Of course we'll still optionally do normalization for scoring purposes so we can compare with the traditional WER metric.

rouseabout · 2023-05-18T13:31:15Z

Another question - Any special reason why many recipes round supervision segment duration to ndigits=8?

I am currently not using rounding in my ATC recipes. I observe that when iterating over the datasets using K2SpeechRecognitionDataset, for some batches, the batch input features T dimension (num_frames) does not match the batch supervisions num_frames max value. For these bad batches, the batch input T dimension is always 1 frame larger than the maximum value of MonoCut.features.num_frames of the batch.

pzelasko · 2023-05-22T16:16:18Z

Another question - Any special reason why many recipes round supervision segment duration to ndigits=8?

Because duration is often computed as end - start from another source, and that introduces float precision errors which are annoying.

I am currently not using rounding in my ATC recipes. I observe that when iterating over the datasets using K2SpeechRecognitionDataset, for some batches, the batch input features T dimension (num_frames) does not match the batch supervisions num_frames max value. For these bad batches, the batch input T dimension is always 1 frame larger than the maximum value of MonoCut.features.num_frames of the batch.

I don't think rounding would contribute to that. The issue you are seeing may be related to some padding/collation edge cases, we could try to debug it with more info.

pzelasko · 2023-05-23T18:48:52Z

@rouseabout @desh2608 Is this ready to merge? If yes, LGTM :)

desh2608 · 2023-05-23T19:57:52Z

@rouseabout @desh2608 Is this ready to merge? If yes, LGTM :)

It's good to go from my side.

desh2608 reviewed May 16, 2023

View reviewed changes

rouseabout added 2 commits May 18, 2023 22:56

Add UWB-ATCC corpus

d0b9932

Add ATCOSIM corpus

ae21640

rouseabout force-pushed the air-traffic-control-corpora branch from 7241876 to ae21640 Compare May 18, 2023 12:56

desh2608 added the corpus label May 22, 2023

Merge branch 'master' into air-traffic-control-corpora

610a954

pzelasko approved these changes May 23, 2023

View reviewed changes

pzelasko enabled auto-merge (squash) May 23, 2023 20:30

pzelasko merged commit ed8620d into lhotse-speech:master May 23, 2023

pzelasko added this to the v1.15 milestone May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Air Traffic Control (ATC) corpora #1061

Air Traffic Control (ATC) corpora #1061

rouseabout commented May 16, 2023

desh2608 May 16, 2023

desh2608 May 16, 2023

desh2608 May 16, 2023

desh2608 commented May 16, 2023

rouseabout commented May 17, 2023

desh2608 commented May 17, 2023

danpovey commented May 18, 2023

rouseabout commented May 18, 2023

pzelasko commented May 22, 2023

pzelasko commented May 23, 2023

desh2608 commented May 23, 2023

		from lhotse.utils import Pathlike, is_module_available, resumable_download


		def safe_extract_rar(

Air Traffic Control (ATC) corpora #1061

Air Traffic Control (ATC) corpora #1061

Conversation

rouseabout commented May 16, 2023

desh2608 May 16, 2023

Choose a reason for hiding this comment

desh2608 May 16, 2023

Choose a reason for hiding this comment

desh2608 May 16, 2023

Choose a reason for hiding this comment

desh2608 commented May 16, 2023

rouseabout commented May 17, 2023

desh2608 commented May 17, 2023

danpovey commented May 18, 2023

rouseabout commented May 18, 2023

pzelasko commented May 22, 2023

pzelasko commented May 23, 2023

desh2608 commented May 23, 2023