Simplify things by lexing ";" in fieldsFromLine#10
Conversation
| inst := strings.Count(operation, ";") + strings.Count(operation, "\n") | ||
| source := prefix + ";" + strings.Repeat(operation+";", 2000) + ";" + suffix | ||
| source = strings.ReplaceAll(source, ";", "\n") | ||
| source := prefix + ";" + strings.Repeat(operation+"\n", 2000) + ";" + suffix |
There was a problem hiding this comment.
Is this just to make assembly take less time or was there an error?
There was a problem hiding this comment.
Removing line 3536 was just because it didn't seem like we should have to replace semi-colons anymore. But with that done, the other change turned out to be necessary. bufio.Scanner has a MaxTokenSize. Since we ask it to read entire lines, a "token" is an entire line. So, as I removed substitutions of "\n" for ";", our benchmarks had enormous lines that exceeded that limit. By adding a newline in the main repeated clause, we avoid the problem. (And, I inserted some code to check scanner.Err() in case we exceed it in the future, changing the error to "too long line" when it happens.)
| } | ||
| currentLine := strings.Join(current, " ") | ||
| if strings.HasPrefix(currentLine, "#pragma") { | ||
| if opstring == "#pragma" { |
There was a problem hiding this comment.
This may break some of the tests, but I think a pretty reasonable rule would be that # directives have to go on their own line. Any strong thoughts on that?
There was a problem hiding this comment.
The rule now is "If you have a #pragma, it consumes the rest of the line, no matter what" so as we make that true of all "# directives" that should handle most of the ideas I've thought of, where the expansion contains semi-colons.
I could see only allowing it at the start of lines though.
| } | ||
| if strings.HasPrefix(opstring, "//") { | ||
| // semicolon inside comment is not counted as newline | ||
| ops.trace("%3d: comment\n", ops.sourceLine) |
There was a problem hiding this comment.
I think this is actually an impossible case that came from when I had a different implementation (prior to what I PR'd), but I'll think about it a bit more
|
|
||
| var spaces = [256]uint8{'\t': 1, ' ': 1} | ||
| // semi-colon is quite space-like, so include it | ||
| var spaces = [256]bool{'\t': true, ' ': true, ';': true} |
There was a problem hiding this comment.
I like this change, but I will say it does mean that we lose the ability to make some weird macros like a macro beginning with a semicolon as we will now separate the semicolon and macro into two distinct tokens. I think that's fine but just something to keep track of
There was a problem hiding this comment.
Worth keeping in mind, but I can't think of an macro that would want to terminate the last "line" from the beginning.
There was a problem hiding this comment.
Oh, but it might not be as bad as you're thinking. Although ";" is its own token, since "# directives" will get the whole rest of the line, I think "#define x y; z" will still cause x to become "x ; z"
See if you like it.
The main idea was move semi-colon processing into
fieldsFromLineas that feels like the right place to be looking at the input character by character. Has the nice advantage of simplifying processFields, and allowing ';' to split tokens, rather than requiring a space afterward.I also removed some duplicate line processing from before the
processFieldsloop. Does that seem good?It's slightly bigger than it needs to be (sorry) because I fixed a typo, and I made the "set" map use a bool instead of int.
Also fixed a wild, ancient bug:
byte base64 base64 //comment
was not broken up right, because the base64 bytes were "base64"!