Skip to content

Commit

Permalink
Fix looking up OOVs in lexicon.txt for MeloTTS models. (#1266)
Browse files Browse the repository at this point in the history
If an English word does not exist in the lexicon, we split
it into characters. For instance, if the word TTS does not
exist in lexicon.txt, we split it into 3 characters T, T, and S.
  • Loading branch information
csukuangfj authored Aug 16, 2024
1 parent 63713ec commit 9dcea49
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions sherpa-onnx/csrc/melo-tts-lexicon.cc
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,22 @@ class MeloTtsLexicon::Impl {
ans.tokens.insert(ans.tokens.end(), ids.tokens.begin(),
ids.tokens.end());
ans.tones.insert(ans.tones.end(), ids.tones.begin(), ids.tones.end());
} else {
// If the lexicon does not contain the word, we split the word into
// characters.
//
// For instance, if the word is TTS and it is does not exist
// in the lexicon, we split it into 3 characters: T T S
std::string s;
for (char c : word) {
s = c;
if (word2ids_.count(s)) {
const auto &t = word2ids_.at(s);
ans.tokens.insert(ans.tokens.end(), t.tokens.begin(),
t.tokens.end());
ans.tones.insert(ans.tones.end(), t.tones.begin(), t.tones.end());
}
}
}
}

Expand Down

0 comments on commit 9dcea49

Please sign in to comment.