-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support mathematical alphanumeric symbols #11570
Comments
Any opinions on this? How would one tackle this? Would it be necessary (and enough) to edit symbols.dic and then let the translators translate it? I'm asking because there have been complaints about other screen readers spelling them (because of its misuse as decorative symbols, other screen readers pronounce "math bold / italic / whatever" before each symbol and spell them, just like EspeakNG does with cyrillic letters). In my opinion however spelling them is better than letting the synth choose (most of the time, silence or unicode representation), as NVDA does. |
I've solved this with an addon which uses Unidecode. Here is the addon, I hope I can submit it for review for the official addons soon. I don't know if I should close this now. |
I'm not sure whether the symbol dictionary would work for this, likely the speech dictionary might be better here. |
I don't know how the speech dictionary would work in this case. Can we create an entry for each letter? How would that be different from a symbols dictionary? Also, I believe the addon solution is more elegant now. Sure, you have to install an addon, select or copy the text which contains the symbols and press a keystroke, but at least it's less verbose than other screen readers which read the symbols one by one with their attributes. |
Hi. I just found out about NVDA, but I'm familiar with the issue of abusing Math symbols. Wouldn't it be better to detect this kind of "Unicode abuse" at a word-level instead of character by character? I'm sure there's all kinds of "fake font" texts, and probably more than mathematical ones, but in my experience the mathematical ones use these characters isolated. Would it be possible/enough to treat them as a "fake font" if there is a sequence of them? A lonely "𝕴" may be a Lie algebra, but "𝕴 𝖆𝖒 𝕲𝖗𝖔𝖔𝖙" is clearly not Math 😅 Unfortunately I don't have the programming chops nor the time to do a PR to the NVDA repo, but if I can help in any other way... |
As a start, these symbols should be being reported with their Mathematical names by adding them to the Symbols.dic. We could eventually add an option to read them as how they are presented visually e.g. "𝕴 𝖆𝖒 𝕲𝖗𝖔𝖔𝖙" as "I am Groot" (in a cursive font). |
@atnbueno I don't think a sequence is sufficient to differentiate between how these symbols should reported. For example, a sequence of legitimate math would not be announced in a useful way. Do you have an alternative mechanism for deciding how these should be reported? |
I think the fake font would be the best way, although it would cause issues in systems where a font doesn't exist, and would make browse mode handling difficult to do. Presumably renderers are calling up some font to handle these symbols anyway as math writing looks very specific. If someone doesn't have font reporting on, the characters would be converted to their unmodified letter equivalents transparently, whereas if they have font reporting on it would say some dummy font name like that from the character "mathematical italic" but only when entering the block, at least in browse mode and perhaps review cursor. Sukil's add-on solution works well enough but to my knowledge is using a bigger hammer than necessary, also downconverting, say, Greek and Cyrillic letters that may look like Latin letters. This has massive ramifications for people who actually read in those languages. My proposal would be based on the information available in unicodedata, which you already ship with and make use of. If the character is within this block, extract the font specifying part of the name for the fake font, and then perform a lookup for the corresponding unmodified character. For example 𝑥 (MATHEMATICAL ITALIC SMALL X) would correspond to x (LATIN SMALL LETTER X). There would be some conditional branching and perhaps regular expression substitution involved, so this might be used to build up an internal dictionary which is used for performance reasons, but it seems like a viable approach for these blocks in particular. There's not much we can do about people using Cyrillic and Greek letters that look kind of like Latin ones for decorative purposes without alienating anyone actually needing to read that, and the language of the synth can't really be relied on either as some synths might internally be able to handle multilingual text in different scripts, see vocalizer's unicode-based automatic language detection. And these blocks contain Greek letters as well which should be converted to their corresponding unmodified Greek counterparts, not similar-looking Latin equivalents! This would also keep equation reading intact for the most part, it might only get slightly odd when people stick multiple letters side by side to indicate a series of multiplication. I think at that point it's better to manually read by character anyway, and you can't automatically break series up because of functions. |
It looks like the solution would involve two things:
Additionally, I'll note the following website as an example which translates standard ASCII characters into these special symbols: If I could assist with this, please let me know. |
See #14781. I've created a template that we can use to integrate the symbols in the symbols.dic along with the way they should be pronounced to make equations readable, then we would only need a way to retrieve the full unicode name on demand if needed with either nvda+comma or by pressing nvda+dot four times. |
@Adriani90 , just to be clear, this particular issue is not about making equations readable. This is about uses of mathematical symbols that look like alphanumeric characters for things other than equations, such as for creating fancy looking text in social media posts, and making NVDA able to read them as though the text is made up of the equivalent standard uppercase and lowercase letters for whatever standard alphabet is appropriate for the user's particular language settings. As such, the symbols.dic won't help with this, as far as I've been able to determine. |
What you describe is a clear use case for symbols dictionary. Adding the symbols there will make NVDA speak them no matter where you type them. It doesn‘t have to be an equation Von meinem iPhone gesendetAm 06.02.2024 um 01:53 schrieb HiEv ***@***.***>:
@Adriani90 , just to be clear, this particular issue is not about making equations readable. This is about uses of mathematical symbols for things other than equations, such as for creating fancy looking text in social media posts, and making NVDA able to read them as though the text is made up of the equivalent standard A-Z and a-z letters, or whatever standard alphabet is appropriate for the user's particular language settings.
As such, the symbols.dic won't help with this, as far as I've been able to determine.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
But it is a very complicated task.
I know it, because i am fighting with misuse of Unicode in the RHVoice speech synthesizer.
Thanks!
From: Adriani90 ***@***.***>
Sent: Tuesday, February 6, 2024 7:52 AM
To: nvaccess/nvda ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [nvaccess/nvda] Support mathematical alphanumeric symbols (#11570)
What you describe is a clear use case for symbols dictionary. Adding the symbols there will make NVDA speak them no matter where you type them. It doesn‘t have to be an equation Von meinem iPhone gesendetAm 06.02.2024 um 01:53 schrieb HiEv ***@***.*** <mailto:***@***.***> >:
@Adriani90 , just to be clear, this particular issue is not about making equations readable. This is about uses of mathematical symbols for things other than equations, such as for creating fancy looking text in social media posts, and making NVDA able to read them as though the text is made up of the equivalent standard A-Z and a-z letters, or whatever standard alphabet is appropriate for the user's particular language settings.
As such, the symbols.dic won't help with this, as far as I've been able to determine.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.*** <mailto:***@***.***> >
—
Reply to this email directly, view it on GitHub <#11570 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACVCDEZB7BAER7Z6XG7G5FTYSHHHTAVCNFSM4Q6IRM7KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJSHA4DQNRZGYZQ> .
You are receiving this because you are subscribed to this thread. <https://github.com/notifications/beacon/ACVCDE5FKNMKX5LDDMBMRJDYSHHHTA5CNFSM4Q6IRM7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOOL4HVMY.gif> Message ID: ***@***.*** ***@***.***> >
|
The problem is that it should be set up so that, out of the box, you can choose to defaulting reading those symbols either as mathematical symbols or as though they're regular letters within words. If you think that this can be done using the symbols dictionary, please explain, but I don't think that that's currently doable with NVDA as-is. |
Hi,
It is not doable, and it cannot be doable without knowing the actual context.
In some language versions of RHVoice, i have solved it by adding these like letter-like symbol, not like mathematical ones.
From: HiEv ***@***.***>
Sent: Tuesday, February 6, 2024 8:46 AM
To: nvaccess/nvda ***@***.***>
Cc: Zvonimir Stanečić ***@***.***>; Comment ***@***.***>
Subject: Re: [nvaccess/nvda] Support mathematical alphanumeric symbols (#11570)
The problem is that it should be set up so that, out of the box, you can choose to defaulting reading those symbols either as mathematical symbols or as though they're regular letters within words. If you think that this can be done using the symbols dictionary, please explain, but I don't think that that's currently doable with NVDA as-is.
—
Reply to this email directly, view it on GitHub <#11570 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACVCDE5J5F74HF3Z7ACZTDLYSHNUXAVCNFSM4Q6IRM7KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJSHA4TINZXG44A> .
You are receiving this because you commented. <https://github.com/notifications/beacon/ACVCDEY6AASHYZY7Z66C3V3YSHNUXA5CNFSM4Q6IRM7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOOL4WQQQ.gif> Message ID: ***@***.*** ***@***.***> >
|
@HiEv this will be possible as soon as NVDA can report the unicode name of characters on demand by pressing e.g. numpad 2 four times or what ever keystroke will be implemented. Currently we can retrieve on ly the html and the unicode entity syntax but not the actual name. If the unicode names will be implemeented, you don't need to change between letter and unicode name because you hear the letter and if you know the context, you can retrieve the full unicode name on demand. This does no refer to mathematics only, musical symbols for example would e impacted as well. So I will add these symbols to the symbols.dic as soon as the coresponding feature is implemented in NVDA. There are already data bases providing the full unicode names in several languages, even Windows seems to have one native unicode name database which delivers the full names in the current Windows locale. For more details, see #14781 and the coresponding comments. |
@Adriani90 - Again, just to be clear, but are you saying that it will be possible for NVDA to read the string "𝒕𝒉𝒊𝒔 𝒊𝒔 𝒂 𝒕𝒆𝒔𝒕", which is a string made up of only Unicode symbols that merely look like standard letters, so that it's pronounced exactly same as the string "this is a test", a string which is made up purely of standard English ASCII letters? Because it sounds like you're saying it would be read as the names of those characters, like "T-symbol H-symbol I-symbol S-symbol, etc..." instead, when that's not what we're talking about here. We don't want the names of each of those symbols, as if they're being used in some sort of equation, instead we want an option so that those symbols can be treated as if they're identical to the letters that they visually look like. Issue #14781 is about mathematical equations; this issue is not about mathematical equations. This issue is about using those mathematical symbols as though they're regular letters in standard text. |
His proposal sounds like a replacement for the character information add-on which might indeed be useful, although there are many cases where CLDR fails to have a description for that symbol so the unicodedata thing has to be used. |
@Adriani90 wrote:
Please have a look to Character Information V3.0, released a few days ago, if it fits your needs. Anyway, this is quite off-topic for this issue. |
@CyrilleB79 thank you very much for this great work, this really opens up some possibilities. I get some errors while testing as you can read below. Test results with Charinfo addon and NVDA with my adjusted symbols.dic where I added all mathematical alphanumeric symbols. Using left and right arrow or ctrl+left and right or whatever on the alphanumeric characters in this issue description: Retrieving the detailed character name via the addon with numpad 2 twice:
In this case I think it is still a huge progress to implement the alphanumeric symbols in the symbols.dic file with the way they should be pronounced in the reading flow and advise users to use the character info addon for detailed character information mentioning that this detailed information is still available only in english. However, at least in the mathematical world I have never heard a person pronouncing i.e. 𝒥 as "mathematical script capial j". So this full unicode name might be relevant for some people, but definitely not for people that just want to read text where these symbols appear. @CyrilleB79 in case you are able to solve the errors and you can succeed in retrieving the full unicode name of the character in local language, this would be a huge improvement but even only with the english character name we can start the right path. |
@seanbudd would NV Access accept a PR with the mathematical alphanumeric symbols implemented as I suggested, advising users in the userguide that they should use the character info addon in case they need the full name of the character? |
@Adriani90 As explained to you by both CyrilleB79 and myself, your comments continue to be off-topic in this issue. This issue is NOT about the symbols being used for mathematics. This is about the symbols being used as normal text. If you have an issue to discuss other than that one, please discuss it in an existing relevant issue or create a new issue. Thank you. |
To be honnest I don‘t get you exactly. Please tell us what exactly is your expected behavior. Do you want these. Characters to be read as letters or not? If yes, then I think you didn‘t understand my comments, maybe it is too technical.Von meinem iPhone gesendetAm 25.02.2024 um 17:29 schrieb HiEv ***@***.***>:
@Adriani90 As explained to you by both CyrilleB79 and myself, your comments continue to be off-topic in this issue. This issue is NOT about the symbols being used for mathematics. This is about the symbols being used as normal text.
If you have an issue to discuss other than this one, please discuss it in an existing relevant issue or create a new issue.
Thank you.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@Adriani90 wrote:
We want the option for the characters to be able to be treated as their standard alphanumeric equivalent so they can be read as if they were a part of normal text. I've seen you talk about these symbols use and pronunciation in mathematics and also about merely speaking the letters several times, so perhaps I'm misinterpreting things, but neither of those are particularly relevant to what is needed to solve this issue. For example, you talked about "the way [the symbols] should be pronounced to make equations readable." These aren't equations, so that's not the issue. In the message I'm replying to, you asked about whether this is about the "characters to be read as letters," which sounds like you're talking about naming each letter, rather than interpreting them as though they're standard characters in a string of text. It's possible there's a language issue here, as it appears that you're a native German speaker, so if I'm understanding you incorrectly, I do apologize, but your comments have been, at best, unclear if you were talking about what this issue is actually related to, and at worst, seemed to be about another thing altogether. |
@HiEv I think you totally missunderstood me.
I hope I could make me more understandable, but you will understand this if you read my comment above carefully. |
@Adriani90 OK, I think we are talking about the same issue then, it's just that A) you brought up things irrelevant to the topic (such as the first two points above) and B) the phrasing you used was often unclear (such as the examples I gave previously), which threw me off. Anyways, glad we're on the same page and apologies for my misunderstandings. |
This is now fixed in #16521. @LeonarddeR it would be great if you enable that by default and make it a checkbox. |
As long as this is not the default, I wouldn't consider this fixed. Also note that this is not listed in the change log nor in the documentation. |
Fixup of #16521 Fixes #11570 Partial fix for #4631 Summary of the issue: It turns out that rawTextTypeforms on a region may be None, this was an oversight on my end. cursorPos may also be None. @burmancomp reported a zero division error in case a string ended with a non breaking space and a space. Description of user facing changes No longer errors in the log when getting flash messages in Thunderbird and/or reading messages in WhatsApp UWP. Description of development approach Explicitly check for None typeforms and cursorPos, thereby improving readability as well. Improve the calculateOffsets method in textUtils to ensure it can handle the case as reported by @burmancomp
Is your feature request related to a problem? Please describe.
Currently, mathematical alphanumeric symbols are gaining traction, both in social media (in nicknames for example), and when viewing LaTeX documents converted to docx (I have encountered this twice).
Describe the solution you'd like
It would be useful to have the Mathematical alphanumeric symbols included in NVDA.
Describe alternatives you've considered
This could go to eSpeak NG too, like they do with cyrillic letters. I'm currently considdering both options as equally valid, but I'm biased towards eSpeak NG mainly because that is the only synth I use, so OneCore users wouldn't benefit from this.
Additional context
Unfortunately I do not know how many of these symbols are widely used (I believe not all of them are, but I don't have any evidence for or against it). Also, to make matters worse, they work on similarity. In this Agora discussion thread it says that "Judicial Jocularity Act" is "𝒥𝓊𝒹𝒾𝒸𝒾𝒶𝓁 𝒥𝑜𝒸𝓊𝓁𝒶𝓇𝒾𝓉𝓎 𝒜𝒸𝓉" are in fact the same, but I find some letter missmatches with the Character Information Addon. I believe, however, that just putting the characters here, in eSpeak NG or wherever will improve the situation greatly.
The text was updated successfully, but these errors were encountered: