Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update japan_dict.txt #13142

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Update japan_dict.txt #13142

wants to merge 1 commit into from

Conversation

madmalkav
Copy link

Update japan_dict.txt to include missing jouyou kanji ( #12940 )

Update japan_dict.txt to include missing jouyou kanji
@CLAassistant
Copy link

CLAassistant commented Jun 20, 2024

CLA assistant check
All committers have signed the CLA.

@GreatV
Copy link
Collaborator

GreatV commented Jun 20, 2024

Will this affect the previous jp model?

@madmalkav
Copy link
Author

It is my understanding this file is only used when training a new model, so I don't think it would have any effect until a new model is trained, but there is a chance I misunderstood everything wrong and it works in a different way 😅

Copy link
Collaborator

@GreatV GreatV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they use the same character dictionary, using the previous model may result in inconsistent output, and I think it would be better to use a different filename (e.g., ja_ext_dict.txt).

@GreatV GreatV requested a review from jzhang533 June 20, 2024 14:02
@madmalkav
Copy link
Author

If this file can indeed affect the current model, I would propose to hold this PR until a new version of the japanese model is going to be trained. I don't like too much the idea of creating a new file and called it "extended" because this is not really extending the dictionary, is fixing a fault in it.

@GreatV
Copy link
Collaborator

GreatV commented Jun 20, 2024

@madmalkav, That makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor language requests Multilingual language requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants