-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Word Count in content structures does not count Chinese words properly #13796
Comments
It looks like this happens because in some languages words may not be separated by spaces. e.g: 这是鸟 means "This is a bird" and it was 3 words without a single character space. |
@jorgefilipecosta I think there may be two ways to fix this bug. One way is like the atom-word-counter, we will present both the count of words(based on English words) and the count of characters (All kinds of characters excluding white space). This only requires little changes in the UI. |
Thank you for summarizing and sharing your thoughts @Jackie6. |
Great ticket. It seems like the two options presented appear to be the "easy" version (count words and characters), and the hard version (be aware of the language when counting words). It seems like the latter is the better user experience, but it could be so difficult that unless we get solid pull requests it may take a while for this to appear. Whereas for the former, it's probably both easy to build, and a character count could likely be useful regardless of language. Keeping in mind we mean to merge the Document Outline tool with the Block Navigation tool, we could possibly build solution 1 at the same time, and then consider upgrading to version 2 at a later time? |
@sandymcfadden, there is #14589 opened with a proposal of how to resolve this issue as suggested in the discussion above. |
Related: #24823 was merged, but it seems this issue is still relevant. |
@swissspidy Ugh, this is a difficult topic. |
There was a brief discussion about this on #43403, and one idea I had is to use the unicode character ranges of written content to determine how words or time to read is calculated. I found a library that seems to use that approach - https://github.com/ngryman/reading-time. |
Describe the bug
When writing a post in Chinese the word count shown in the content structure does not show an accurate word count.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
When using the same content in word processor like Pages the word count is significantly larger than the one shown in Gutenberg. The expected behavior would be to have an accurate word count independent of the language used.
Screenshots
You can see a large amount of content but it is showing only 10 words
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: