Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with ASR notebooks #2698

Merged
merged 1 commit into from
Aug 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions tutorials/asr/ASR_with_Transducers.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@
"source": [
"import wget\n",
"import tarfile \n",
"import subprocess \n",
"import glob\n",
"\n",
"data_dir = \"datasets\"\n",
Expand Down Expand Up @@ -616,7 +617,7 @@
"\n",
"4) Feed this $U_{sub-batch}$ into the Joint model, along with a sub-batch from the Acoustic model (with $T_{sub-batch} < T$). Remember, we only have to slice off a part of the acoustic model here since we have the full batch of samples $(B, T, D)$ from the acoustic model.\n",
"\n",
"5) Perfoming steps (3) and (4) yields $T_{sub-batch}$ and $U_{sub-batch}$. Perform sub-batch joint step - costing an intermediate $(B, T_{sub-batch}, U_{sub-batch}, V)$ in memory.\n",
"5) Performing steps (3) and (4) yields $T_{sub-batch}$ and $U_{sub-batch}$. Perform sub-batch joint step - costing an intermediate $(B, T_{sub-batch}, U_{sub-batch}, V)$ in memory.\n",
"\n",
"6) Compute loss on sub-batch and preserve in a list to be later concatenated. \n",
"\n",
Expand Down Expand Up @@ -800,7 +801,7 @@
"id": "1NkfrA2l6DBF"
},
"source": [
"[link text](https:// [link text](https://))## (Optional) Partially loading pre-trained weights from another model\n",
"## (Optional) Partially loading pre-trained weights from another model\n",
"\n",
"An interesting point to note about Transducer models - the Acoustic model config (and therefore the Acoustic model itself) can be shared between CTC and Transducer models.\n",
"\n",
Expand Down Expand Up @@ -1304,7 +1305,7 @@
"source": [
"------\n",
"\n",
"Finally, let us calculate the alignment grid. We will de-tokenze the sub-word token if it is a valid index in the vocabulary and use `''` as a placeholder for the `Transducer Blank` token.\n",
"Finally, let us calculate the alignment grid. We will de-tokenize the sub-word token if it is a valid index in the vocabulary and use `''` as a placeholder for the `Transducer Blank` token.\n",
"\n",
"Note that each `timestep` here is (roughly) $timestep * total\\_stride\\_of\\_model * preprocessor.window\\_stride$ seconds timestamp. \n",
"\n",
Expand Down
8 changes: 4 additions & 4 deletions tutorials/asr/Intro_to_Transducers.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,7 @@
"source": [
"## Model Defaults\n",
"\n",
"Since the transducer model is comprised of three seperate models working in unison, it is practical to have some shared section of the config. That shared section is called `model.model_defaults`."
"Since the transducer model is comprised of three separate models working in unison, it is practical to have some shared section of the config. That shared section is called `model.model_defaults`."
]
},
{
Expand Down Expand Up @@ -553,7 +553,7 @@
"\n",
"The Joint model config has several essential components which we discuss below :\n",
"\n",
"1) `log_softmax`: Due to the cost of computing softmax on such large tensors, the Numba CUDA implementation of RNNT loss will implicitly compute the log softmax when called (so its inputs should be logits). The CPU version of the loss doesnt face such memory issues so it requires log-probabilities instead. Since the behaviour is different for CPU-GPU, the `None` value will automatically switch behaviour dependent on whether the input tensor is on a CPU or GPU device.\n",
"1) `log_softmax`: Due to the cost of computing softmax on such large tensors, the Numba CUDA implementation of RNNT loss will implicitly compute the log softmax when called (so its inputs should be logits). The CPU version of the loss doesn't face such memory issues so it requires log-probabilities instead. Since the behaviour is different for CPU-GPU, the `None` value will automatically switch behaviour dependent on whether the input tensor is on a CPU or GPU device.\n",
"\n",
"2) `preserve_memory`: This flag will call `torch.cuda.empty_cache()` at certain critical sections when computing the Joint tensor. While this operation might allow us to preserve some memory, the empty_cache() operation is tremendously slow and will slow down training by an order of magnitude or more. It is available to use but not recommended.\n",
"\n",
Expand Down Expand Up @@ -648,9 +648,9 @@
"source": [
"-------\n",
"\n",
"This argument `max_symbols` is the maximum number of `target token` decoding steps $u \\le U$ per acoustic timestep $t \\le T$. Note that during training, this was implicitly constrained by the shape of the joint matrix (max_symbols = $U$). However, there is no such $U$ upper bound during inference (we dont have the ground truth $U$).\n",
"This argument `max_symbols` is the maximum number of `target token` decoding steps $u \\le U$ per acoustic timestep $t \\le T$. Note that during training, this was implicitly constrained by the shape of the joint matrix (max_symbols = $U$). However, there is no such $U$ upper bound during inference (we don't have the ground truth $U$).\n",
"\n",
"So we explicitly set a heuristic upper bound on how many decoding steps can be performed per acoustic timestep. Generally a value of 5 and above is suffcient."
"So we explicitly set a heuristic upper bound on how many decoding steps can be performed per acoustic timestep. Generally a value of 5 and above is sufficient."
]
},
{
Expand Down