From f347f771d4bc085e66b46aca70a3a6515d329cc8 Mon Sep 17 00:00:00 2001
From: Somshubra Majumdar <titu1994@gmail.com>
Date: Thu, 19 Aug 2021 09:17:32 -0700
Subject: [PATCH] Fix issues with ASR notebooks

Signed-off-by: smajumdar <titu1994@gmail.com>
---
 tutorials/asr/ASR_with_Transducers.ipynb | 7 ++++---
 tutorials/asr/Intro_to_Transducers.ipynb | 8 ++++----
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/tutorials/asr/ASR_with_Transducers.ipynb b/tutorials/asr/ASR_with_Transducers.ipynb
index 3c131d843050..d94178a7bc13 100644
--- a/tutorials/asr/ASR_with_Transducers.ipynb
+++ b/tutorials/asr/ASR_with_Transducers.ipynb
@@ -143,6 +143,7 @@
       "source": [
         "import wget\n",
         "import tarfile \n",
+        "import subprocess \n",
         "import glob\n",
         "\n",
         "data_dir = \"datasets\"\n",
@@ -616,7 +617,7 @@
         "\n",
         "4) Feed this $U_{sub-batch}$ into the Joint model, along with a sub-batch from the Acoustic model (with $T_{sub-batch} < T$). Remember, we only have to slice off a part of the acoustic model here since we have the full batch of samples $(B, T, D)$ from the acoustic model.\n",
         "\n",
-        "5) Perfoming steps (3) and (4) yields $T_{sub-batch}$ and $U_{sub-batch}$. Perform sub-batch joint step - costing an intermediate $(B, T_{sub-batch}, U_{sub-batch}, V)$ in memory.\n",
+        "5) Performing steps (3) and (4) yields $T_{sub-batch}$ and $U_{sub-batch}$. Perform sub-batch joint step - costing an intermediate $(B, T_{sub-batch}, U_{sub-batch}, V)$ in memory.\n",
         "\n",
         "6) Compute loss on sub-batch and preserve in a list to be later concatenated. \n",
         "\n",
@@ -800,7 +801,7 @@
         "id": "1NkfrA2l6DBF"
       },
       "source": [
-        "[link text](https:// [link text](https://))## (Optional) Partially loading pre-trained weights from another model\n",
+        "## (Optional) Partially loading pre-trained weights from another model\n",
         "\n",
         "An interesting point to note about Transducer models - the Acoustic model config (and therefore the Acoustic model itself) can be shared between CTC and Transducer models.\n",
         "\n",
@@ -1304,7 +1305,7 @@
       "source": [
         "------\n",
         "\n",
-        "Finally, let us calculate the alignment grid. We will de-tokenze the sub-word token if it is a valid index in the vocabulary and use `''` as a placeholder for the `Transducer Blank` token.\n",
+        "Finally, let us calculate the alignment grid. We will de-tokenize the sub-word token if it is a valid index in the vocabulary and use `''` as a placeholder for the `Transducer Blank` token.\n",
         "\n",
         "Note that each `timestep` here is (roughly) $timestep * total\\_stride\\_of\\_model * preprocessor.window\\_stride$ seconds timestamp. \n",
         "\n",
diff --git a/tutorials/asr/Intro_to_Transducers.ipynb b/tutorials/asr/Intro_to_Transducers.ipynb
index 8189e7ed5bde..0387003c8bc6 100644
--- a/tutorials/asr/Intro_to_Transducers.ipynb
+++ b/tutorials/asr/Intro_to_Transducers.ipynb
@@ -423,7 +423,7 @@
       "source": [
         "## Model Defaults\n",
         "\n",
-        "Since the transducer model is comprised of three seperate models working in unison, it is practical to have some shared section of the config. That shared section is called `model.model_defaults`."
+        "Since the transducer model is comprised of three separate models working in unison, it is practical to have some shared section of the config. That shared section is called `model.model_defaults`."
       ]
     },
     {
@@ -553,7 +553,7 @@
         "\n",
         "The Joint model config has several essential components which we discuss below :\n",
         "\n",
-        "1) `log_softmax`: Due to the cost of computing softmax on such large tensors, the Numba CUDA implementation of RNNT loss will implicitly compute the log softmax when called (so its inputs should be logits). The CPU version of the loss doesnt face such memory issues so it requires log-probabilities instead. Since the behaviour is different for CPU-GPU, the `None` value will automatically switch behaviour dependent on whether the input tensor is on a CPU or GPU device.\n",
+        "1) `log_softmax`: Due to the cost of computing softmax on such large tensors, the Numba CUDA implementation of RNNT loss will implicitly compute the log softmax when called (so its inputs should be logits). The CPU version of the loss doesn't face such memory issues so it requires log-probabilities instead. Since the behaviour is different for CPU-GPU, the `None` value will automatically switch behaviour dependent on whether the input tensor is on a CPU or GPU device.\n",
         "\n",
         "2) `preserve_memory`: This flag will call `torch.cuda.empty_cache()` at certain critical sections when computing the Joint tensor. While this operation might allow us to preserve some memory, the empty_cache() operation is tremendously slow and will slow down training by an order of magnitude or more. It is available to use but not recommended.\n",
         "\n",
@@ -648,9 +648,9 @@
       "source": [
         "-------\n",
         "\n",
-        "This argument `max_symbols` is the maximum number of `target token` decoding steps $u \\le U$ per acoustic timestep $t \\le T$. Note that during training, this was implicitly constrained by the shape of the joint matrix (max_symbols = $U$). However, there is no such $U$ upper bound during inference (we dont have the ground truth $U$).\n",
+        "This argument `max_symbols` is the maximum number of `target token` decoding steps $u \\le U$ per acoustic timestep $t \\le T$. Note that during training, this was implicitly constrained by the shape of the joint matrix (max_symbols = $U$). However, there is no such $U$ upper bound during inference (we don't have the ground truth $U$).\n",
         "\n",
-        "So we explicitly set a heuristic upper bound on how many decoding steps can be performed per acoustic timestep. Generally a value of 5 and above is suffcient."
+        "So we explicitly set a heuristic upper bound on how many decoding steps can be performed per acoustic timestep. Generally a value of 5 and above is sufficient."
       ]
     },
     {