-
-
Notifications
You must be signed in to change notification settings - Fork 6
Revise escape sequence scanning #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@mmatera There is still a lot to do regarding writing tests and ensuring we handle errors and everything correctly. But since this is getting large, this is a heads-up as to what's on the horizon. |
a5b13d0 to
1ab77fa
Compare
Escape sequences other than named characters have been removed from the prescanner and put in the scanner.
handle syntax errors in mathics3-tokens.
Tokenizer.code -> Tokenizer.source_text Tokenizer.incomplete -> Tokenizer.get_more_input
Start to show syntax errors.
In particular errors with octal digits and incomplete named errors. Go over docstrings in escape_sequences.py
and add more tests.
named-characters.yml: \[Mu] is letterlike tokeniser.py: Correct identifier or pattern for those having letterlike escape sequences
and also add Theta to the list of letterlike symbols
Replace .format() with f-strings. Add comments around Symbol pattern. sntx_message() Excpetion now saves name, tag, and args
Not sure how this worked before, but it did.
* "$\" is a thing * Correct EscapeSyntaxError error message * Better Symbol tokenization for things like a\[Mu]1. More in next commit though.
for things like \.78\.79 Imporve comments around DRYing identifier/symbol_name extension
This PR has gotten out of hand in size, we'll break it up into smaller chunks.
NamedChracterSyntax should be a new-style TranslateError self.code -> self.source_text misc sntx_message() fixes. Document better.
ef9b7c5 to
53b1402
Compare
53b1402 to
74587cc
Compare
TranslateError, TranslateErrorNew, ScanError now become ScannerError
it should be just a little bit faster (and it is more modern)
Use more direct and simpler error class name that is is more like its other subclassed errors.
mmatera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
|
@mmatera merging needs to be coordinated with Mathics3/mathics-core#1403 since exception handling has been revised. (In a previous PR, you commented on the use of After this PR, we'll still need to handle boxing expressions inside strings. Boxing expression outside strings, I think, works. But I haven't been able to get to galatea to understand what is expected versus not. |
fa1155d to
2422c60
Compare
An invalid escape sequence inside a string, like "\(a \+\)" is not an error. Instead the sequence the same, e.g "\(a \+\)".
If the escape sequenced in a string can be a boxing construct, then this
is not an error in the escape sequence. Otherwise, it is.
For example
"\(" is not an error in a string while "\g" is.
Yes, this a bit involved. But that's the way WA works.
Also, flatten values in box operators for BOXING_CONSTRUCT_SUFFIXES
Refactor the scanner. Remove prescanner. Handling escape sequences is not a separate phase, but instead is integrated into the scanning phase.
Fixes #125