Fix typos (#6523)

* Fix typos Signed-off-by: smajumdar <[email protected]> * Fix typos Signed-off-by: smajumdar <[email protected]> --------- Signed-off-by: smajumdar <[email protected]> (cherry picked from commit 5468077)
NVIDIA · May 2, 2023 · 1bc95a4 · 1bc95a4
1 parent 60aebb0
commit 1bc95a4
Show file tree

Hide file tree

Showing 6 changed files with 12 additions and 12 deletions.
diff --git a/tutorials/asr/ASR_CTC_Language_Finetuning.ipynb b/tutorials/asr/ASR_CTC_Language_Finetuning.ipynb
@@ -539,8 +539,8 @@
         "import matplotlib.pyplot as plt\n",
         "\n",
         "plt.bar(x=TOKEN_COUNT_X, height=NUM_TOKENS_Y)\n",
-        "plt.title(\"Occurance of unique tokens in train+dev set\")\n",
-        "plt.xlabel(\"# of occurances\")\n",
+        "plt.title(\"Occurrences of unique tokens in train+dev set\")\n",
+        "plt.xlabel(\"# of occurrences\")\n",
         "plt.ylabel(\"# of tokens\")\n",
         "plt.xlim(0, MAX_COUNT);"
       ],
@@ -564,13 +564,13 @@
       "source": [
         "UNCOMMON_TOKENS_COUNT = 5\n",
         "\n",
-        "chars_with_infrequent_occurance = set()\n",
+        "chars_with_infrequent_occurrence = set()\n",
         "for count in range(1, UNCOMMON_TOKENS_COUNT + 1):\n",
         "    if count in train_counts:\n",
         "        token_list = train_counts[count]\n",
-        "        chars_with_infrequent_occurance.update(set(token_list))\n",
+        "        chars_with_infrequent_occurrence.update(set(token_list))\n",
         "\n",
-        "print(f\"Number of tokens with <= {UNCOMMON_TOKENS_COUNT} occurances : {len(chars_with_infrequent_occurance)}\")"
+        "print(f\"Number of tokens with <= {UNCOMMON_TOKENS_COUNT} occurrences : {len(chars_with_infrequent_occurrence)}\")"
       ],
       "execution_count": null,
       "outputs": []

diff --git a/tutorials/asr/ASR_with_Subword_Tokenization.ipynb b/tutorials/asr/ASR_with_Subword_Tokenization.ipynb
@@ -311,7 +311,7 @@
         "\r\n",
         " - Sophisticated subword tokenization algorithms build their vocabularies based on large text corpora. To accurately tokenize such large volumes of text with minimal vocabulary size, the subwords that are learned inherently model the interdependency between tokens of that language to some degree. \r\n",
         " \r\n",
-        "Looking at the previous example, the token `hel##` is a single token that represents the relationship `h` => `e` => `l`. When the model predicts the singe token `hel##`, it implicitly predicts this relationship - even though the subsequent token can be either `l` (for `hell`) or `##lo` (for `hello`) and is predicted independently of the previous token!\r\n",
+        "Looking at the previous example, the token `hel##` is a single token that represents the relationship `h` => `e` => `l`. When the model predicts the single token `hel##`, it implicitly predicts this relationship - even though the subsequent token can be either `l` (for `hell`) or `##lo` (for `hello`) and is predicted independently of the previous token!\r\n",
         "\r\n",
         " - By reducing the target sentence length by subword tokenization (target sentence here being the characters/subwords transcribed from the audio signal), we entirely sidestep the sequence length limitation of CTC loss!\r\n",
         "\r\n",
@@ -553,7 +553,7 @@
         "\r\n",
         " - `--spe_sample_size`: If the dataset is too large, consider using a sampled dataset indicated by a positive integer. By default, any negative value (default = -1) will use the entire dataset.\r\n",
         "\r\n",
-        " - `--spe_train_extremely_large_corpus`: When training a sentencepiece tokenizer on very large amounts of text, sometimes the tokenizer will run out of memory or wont be able to process so much data on RAM. At some point you might receive the following error - \"Input corpus too large, try with train_extremely_large_corpus=true\". If your machine has large amounts of RAM, it might still be possible to build the tokenizer using the above flag. Will silently fail if it runs out of RAM.\r\n",
+        " - `--spe_train_extremely_large_corpus`: When training a sentencepiece tokenizer on very large amounts of text, sometimes the tokenizer will run out of memory or won't be able to process so much data on RAM. At some point you might receive the following error - \"Input corpus too large, try with train_extremely_large_corpus=true\". If your machine has large amounts of RAM, it might still be possible to build the tokenizer using the above flag. Will silently fail if it runs out of RAM.\r\n",
         "\r\n",
         " - `--log`: Whether the script should display log messages"
       ]

diff --git a/tutorials/asr/Buffered_Transducer_Inference.ipynb b/tutorials/asr/Buffered_Transducer_Inference.ipynb
@@ -805,7 +805,7 @@
     "                  print(\"\\nGreedy labels collected from this buffer\")\n",
     "                  print(tok[len(tok) - 1 - delay:len(tok) - 1 - delay + tokens_per_chunk])                \n",
     "                  self.toks_unmerged += tok[len(tok) - 1 - delay:len(tok) - 1 - delay + tokens_per_chunk]\n",
-    "              print(\"\\nTokens collected from succesive buffers before RNNT merge\")\n",
+    "              print(\"\\nTokens collected from successive buffers before RNNT merge\")\n",
     "              print(self.toks_unmerged)\n",
     "\n",
     "        output = []\n",

diff --git a/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb b/tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb
@@ -439,7 +439,7 @@
                 "    Arg:\n",
                 "        wav_file: wave file to be performed inference on.\n",
                 "        STEP: infer every STEP seconds \n",
-                "        WINDOW_SIZE : lenght of audio to be sent to NN.\n",
+                "        WINDOW_SIZE : length of audio to be sent to NN.\n",
                 "    \"\"\"\n",
                 "    \n",
                 "    FRAME_LEN = STEP \n",

diff --git a/tutorials/asr/Streaming_ASR.ipynb b/tutorials/asr/Streaming_ASR.ipynb
@@ -537,7 +537,7 @@
     "                print(\"\\nGreedy labels collected from this buffer\")\n",
     "                print(tok[len(tok) - 1 - delay:len(tok) - 1 - delay + self.n_tokens_per_chunk])                \n",
     "                self.toks_unmerged += tok[len(tok) - 1 - delay:len(tok) - 1 - delay + self.n_tokens_per_chunk]\n",
-    "            print(\"\\nTokens collected from succesive buffers before CTC merge\")\n",
+    "            print(\"\\nTokens collected from successive buffers before CTC merge\")\n",
     "            print(self.toks_unmerged)\n",
     "\n",
     "\n",

diff --git a/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb b/tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb
@@ -664,7 +664,7 @@
         "\n",
         "For this experiment we will continue to use the original spec augmentation config in the base model, however you may find better results by modifying the strength of this augmentation.\n",
         "\n",
-        "**Note**: The script inside ASR examples **disables spec augment entirely**. This is done in order to provide a stable default to measure the best possible adaptation case, but may severely degrade the performance on general speech. Please be careful when copying the hyper parameters from the tutorial to the script for large scale experimentatin."
+        "**Note**: The script inside ASR examples **disables spec augment entirely**. This is done in order to provide a stable default to measure the best possible adaptation case, but may severely degrade the performance on general speech. Please be careful when copying the hyper parameters from the tutorial to the script for large scale experimentation."
       ],
       "metadata": {
         "id": "T3VuqcGTNuIJ"
@@ -803,7 +803,7 @@
       "source": [
         "-----\n",
         "\n",
-        "As you can see, a single component of the model may support one or more adapter types (or none at all)! Below, we will experiment with the simple Linear Adapters, but as an excercise, you might try to use other adapter types present here."
+        "As you can see, a single component of the model may support one or more adapter types (or none at all)! Below, we will experiment with the simple Linear Adapters, but as an exercise, you might try to use other adapter types present here."
       ],
       "metadata": {
         "id": "YXTC4LiSnB2O"