You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi just out of curiosity, since i'm building a dictionary app, i found out about this problem.
Basically there are some duplicated characters in unicode (i don't know how one would input those, but they have a different encoding than the ones from the standard japanese keyboard)
Basically when you search on jisho.org with this alternative version of the characters you don't find anything.
So i don't know if this impacts regular users or not, or if is standard practice to convert the character or not(i.e. normalization https://unicode.org/faq/normalization.html)
I read up a bit on this in CJKV Information Processing (page 167+). Since these characters are subject to Unicode Normalization they might get automatically normalized by OS/browsers. So I did some testing on this to see if that normalization is applied before reaching Jisho.
Safari on macOS - normalized
Firefox on macOS - not normalized
Chrome on macOS - not normalized
Edge on Win11 (virtualized on macOS) - normalized
Chrome on Win11 (virtualized on macOS) - normalized
So it seems like in most cases the compatibility ideograph is being normalized to the unified version and thus gives search results as expected.
But given that that is not the case for some OS/browser combo's I will consider adding a normalization step in the new version of Jisho that I'm working on.
Thanks for pointing this out. I hope this helps your development efforts as well :)
I'll add that I think I have gotten one other question about these compatibility ideographs over the close to 20 years of running Jisho, so I don't think it's a very common issue.
Hi just out of curiosity, since i'm building a dictionary app, i found out about this problem.
Basically there are some duplicated characters in unicode (i don't know how one would input those, but they have a different encoding than the ones from the standard japanese keyboard)
Basically when you search on jisho.org with this alternative version of the characters you don't find anything.
So i don't know if this impacts regular users or not, or if is standard practice to convert the character or not(i.e. normalization https://unicode.org/faq/normalization.html)
Example 1
Search for 金 https://jisho.org/search/%EF%A4%8A No result
Search for 金 https://jisho.org/search/%E9%87%91 997 Results
Example 2
車 https://jisho.org/search/%EF%A4%82 No result
車 https://jisho.org/search/%E8%BB%8A
You can browse the full list
https://en.wikipedia.org/wiki/CJK_Compatibility_Ideographs
The text was updated successfully, but these errors were encountered: