-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update japan_dict.txt #13142
base: main
Are you sure you want to change the base?
Update japan_dict.txt #13142
Conversation
Update japan_dict.txt to include missing jouyou kanji
Will this affect the previous jp model? |
It is my understanding this file is only used when training a new model, so I don't think it would have any effect until a new model is trained, but there is a chance I misunderstood everything wrong and it works in a different way 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since they use the same character dictionary, using the previous model may result in inconsistent output, and I think it would be better to use a different filename (e.g., ja_ext_dict.txt).
If this file can indeed affect the current model, I would propose to hold this PR until a new version of the japanese model is going to be trained. I don't like too much the idea of creating a new file and called it "extended" because this is not really extending the dictionary, is fixing a fault in it. |
@madmalkav, That makes sense. |
Update japan_dict.txt to include missing jouyou kanji ( #12940 )