-
-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for ISO-639 language codes #106
Add support for ISO-639 language codes #106
Conversation
Currently, the language of the text to summarize has to be specified as a language name like "german" or "french". However, many tools such as Apache Tika output ISO-639 language codes which makes it difficult to integrate sumy with the wider natural language processing ecosystem. This commit ensures that sumy can understand language codes passed as ISO-639, in both two-letter format (e.g. "de" or "fr") and three-letter format (e.g. "ger" or "fra"). Resolves #96
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the code. I really appreciate it. But please take a look on my comments.
Thanks for the review. Addressed all the comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot. Good work 👍
Thanks for the merge! |
Currently, the language of the text to summarize has to be specified as a language name like "german" or "french". However, many tools such as Apache Tika output ISO-639 language codes which makes it difficult to integrate sumy with the wider natural language processing ecosystem.
This commit ensures that sumy can understand language codes passed as ISO-639, in both two-letter format (e.g. "de" or "fr") and three-letter format (e.g. "ger" or "fra").
Resolves #96