Skip to content

Commit

Permalink
Fix typos (#6523)
Browse files Browse the repository at this point in the history
* Fix typos

Signed-off-by: smajumdar <[email protected]>

* Fix typos

Signed-off-by: smajumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
(cherry picked from commit 5468077)
  • Loading branch information
titu1994 committed May 2, 2023
1 parent 60aebb0 commit 1bc95a4
Show file tree
Hide file tree
Showing 6 changed files with 12 additions and 12 deletions.
10 changes: 5 additions & 5 deletions tutorials/asr/ASR_CTC_Language_Finetuning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -539,8 +539,8 @@
"import matplotlib.pyplot as plt\n",
"\n",
"plt.bar(x=TOKEN_COUNT_X, height=NUM_TOKENS_Y)\n",
"plt.title(\"Occurance of unique tokens in train+dev set\")\n",
"plt.xlabel(\"# of occurances\")\n",
"plt.title(\"Occurrences of unique tokens in train+dev set\")\n",
"plt.xlabel(\"# of occurrences\")\n",
"plt.ylabel(\"# of tokens\")\n",
"plt.xlim(0, MAX_COUNT);"
],
Expand All @@ -564,13 +564,13 @@
"source": [
"UNCOMMON_TOKENS_COUNT = 5\n",
"\n",
"chars_with_infrequent_occurance = set()\n",
"chars_with_infrequent_occurrence = set()\n",
"for count in range(1, UNCOMMON_TOKENS_COUNT + 1):\n",
" if count in train_counts:\n",
" token_list = train_counts[count]\n",
" chars_with_infrequent_occurance.update(set(token_list))\n",
" chars_with_infrequent_occurrence.update(set(token_list))\n",
"\n",
"print(f\"Number of tokens with <= {UNCOMMON_TOKENS_COUNT} occurances : {len(chars_with_infrequent_occurance)}\")"
"print(f\"Number of tokens with <= {UNCOMMON_TOKENS_COUNT} occurrences : {len(chars_with_infrequent_occurrence)}\")"
],
"execution_count": null,
"outputs": []
Expand Down
4 changes: 2 additions & 2 deletions tutorials/asr/ASR_with_Subword_Tokenization.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@
"\r\n",
" - Sophisticated subword tokenization algorithms build their vocabularies based on large text corpora. To accurately tokenize such large volumes of text with minimal vocabulary size, the subwords that are learned inherently model the interdependency between tokens of that language to some degree. \r\n",
" \r\n",
"Looking at the previous example, the token `hel##` is a single token that represents the relationship `h` => `e` => `l`. When the model predicts the singe token `hel##`, it implicitly predicts this relationship - even though the subsequent token can be either `l` (for `hell`) or `##lo` (for `hello`) and is predicted independently of the previous token!\r\n",
"Looking at the previous example, the token `hel##` is a single token that represents the relationship `h` => `e` => `l`. When the model predicts the single token `hel##`, it implicitly predicts this relationship - even though the subsequent token can be either `l` (for `hell`) or `##lo` (for `hello`) and is predicted independently of the previous token!\r\n",
"\r\n",
" - By reducing the target sentence length by subword tokenization (target sentence here being the characters/subwords transcribed from the audio signal), we entirely sidestep the sequence length limitation of CTC loss!\r\n",
"\r\n",
Expand Down Expand Up @@ -553,7 +553,7 @@
"\r\n",
" - `--spe_sample_size`: If the dataset is too large, consider using a sampled dataset indicated by a positive integer. By default, any negative value (default = -1) will use the entire dataset.\r\n",
"\r\n",
" - `--spe_train_extremely_large_corpus`: When training a sentencepiece tokenizer on very large amounts of text, sometimes the tokenizer will run out of memory or wont be able to process so much data on RAM. At some point you might receive the following error - \"Input corpus too large, try with train_extremely_large_corpus=true\". If your machine has large amounts of RAM, it might still be possible to build the tokenizer using the above flag. Will silently fail if it runs out of RAM.\r\n",
" - `--spe_train_extremely_large_corpus`: When training a sentencepiece tokenizer on very large amounts of text, sometimes the tokenizer will run out of memory or won't be able to process so much data on RAM. At some point you might receive the following error - \"Input corpus too large, try with train_extremely_large_corpus=true\". If your machine has large amounts of RAM, it might still be possible to build the tokenizer using the above flag. Will silently fail if it runs out of RAM.\r\n",
"\r\n",
" - `--log`: Whether the script should display log messages"
]
Expand Down
2 changes: 1 addition & 1 deletion tutorials/asr/Buffered_Transducer_Inference.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -805,7 +805,7 @@
" print(\"\\nGreedy labels collected from this buffer\")\n",
" print(tok[len(tok) - 1 - delay:len(tok) - 1 - delay + tokens_per_chunk]) \n",
" self.toks_unmerged += tok[len(tok) - 1 - delay:len(tok) - 1 - delay + tokens_per_chunk]\n",
" print(\"\\nTokens collected from succesive buffers before RNNT merge\")\n",
" print(\"\\nTokens collected from successive buffers before RNNT merge\")\n",
" print(self.toks_unmerged)\n",
"\n",
" output = []\n",
Expand Down
2 changes: 1 addition & 1 deletion tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -439,7 +439,7 @@
" Arg:\n",
" wav_file: wave file to be performed inference on.\n",
" STEP: infer every STEP seconds \n",
" WINDOW_SIZE : lenght of audio to be sent to NN.\n",
" WINDOW_SIZE : length of audio to be sent to NN.\n",
" \"\"\"\n",
" \n",
" FRAME_LEN = STEP \n",
Expand Down
2 changes: 1 addition & 1 deletion tutorials/asr/Streaming_ASR.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -537,7 +537,7 @@
" print(\"\\nGreedy labels collected from this buffer\")\n",
" print(tok[len(tok) - 1 - delay:len(tok) - 1 - delay + self.n_tokens_per_chunk]) \n",
" self.toks_unmerged += tok[len(tok) - 1 - delay:len(tok) - 1 - delay + self.n_tokens_per_chunk]\n",
" print(\"\\nTokens collected from succesive buffers before CTC merge\")\n",
" print(\"\\nTokens collected from successive buffers before CTC merge\")\n",
" print(self.toks_unmerged)\n",
"\n",
"\n",
Expand Down
4 changes: 2 additions & 2 deletions tutorials/asr/asr_adapters/ASR_with_Adapters.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -664,7 +664,7 @@
"\n",
"For this experiment we will continue to use the original spec augmentation config in the base model, however you may find better results by modifying the strength of this augmentation.\n",
"\n",
"**Note**: The script inside ASR examples **disables spec augment entirely**. This is done in order to provide a stable default to measure the best possible adaptation case, but may severely degrade the performance on general speech. Please be careful when copying the hyper parameters from the tutorial to the script for large scale experimentatin."
"**Note**: The script inside ASR examples **disables spec augment entirely**. This is done in order to provide a stable default to measure the best possible adaptation case, but may severely degrade the performance on general speech. Please be careful when copying the hyper parameters from the tutorial to the script for large scale experimentation."
],
"metadata": {
"id": "T3VuqcGTNuIJ"
Expand Down Expand Up @@ -803,7 +803,7 @@
"source": [
"-----\n",
"\n",
"As you can see, a single component of the model may support one or more adapter types (or none at all)! Below, we will experiment with the simple Linear Adapters, but as an excercise, you might try to use other adapter types present here."
"As you can see, a single component of the model may support one or more adapter types (or none at all)! Below, we will experiment with the simple Linear Adapters, but as an exercise, you might try to use other adapter types present here."
],
"metadata": {
"id": "YXTC4LiSnB2O"
Expand Down

0 comments on commit 1bc95a4

Please sign in to comment.