-
The list of languages mentions Chinese which is imprecise when dealing with spoken languages which might share similar writing systems. Is it trained on spoken Mandarin, Cantonese, Hakka, something else, some of these, all of these, etc? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
When using |
Beta Was this translation helpful? Give feedback.
When using
zh (Chinese)
as the language label, ourmedium
andlarge
models showed ~8% WER and a BLEU score of ~13 on theyue_hant_hk
split of the Fleurs dataset, suggesting that it works quite decently on Cantonese and possibly on other dialects as well. We'd be interested in knowing how usable it is for the other Sinitic languages.