Skip to content

Simplify things by lexing ";" in fieldsFromLine#10

Merged
iten-alg merged 1 commit into
iten-alg:semicolonsfrom
jannotti:semi-as-token
Aug 9, 2022
Merged

Simplify things by lexing ";" in fieldsFromLine#10
iten-alg merged 1 commit into
iten-alg:semicolonsfrom
jannotti:semi-as-token

Conversation

@jannotti
Copy link
Copy Markdown
Collaborator

@jannotti jannotti commented Aug 7, 2022

See if you like it.

The main idea was move semi-colon processing into fieldsFromLine as that feels like the right place to be looking at the input character by character. Has the nice advantage of simplifying processFields, and allowing ';' to split tokens, rather than requiring a space afterward.

I also removed some duplicate line processing from before the processFields loop. Does that seem good?

It's slightly bigger than it needs to be (sorry) because I fixed a typo, and I made the "set" map use a bool instead of int.

Also fixed a wild, ancient bug:
byte base64 base64 //comment
was not broken up right, because the base64 bytes were "base64"!

inst := strings.Count(operation, ";") + strings.Count(operation, "\n")
source := prefix + ";" + strings.Repeat(operation+";", 2000) + ";" + suffix
source = strings.ReplaceAll(source, ";", "\n")
source := prefix + ";" + strings.Repeat(operation+"\n", 2000) + ";" + suffix
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just to make assembly take less time or was there an error?

Copy link
Copy Markdown
Collaborator Author

@jannotti jannotti Aug 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing line 3536 was just because it didn't seem like we should have to replace semi-colons anymore. But with that done, the other change turned out to be necessary. bufio.Scanner has a MaxTokenSize. Since we ask it to read entire lines, a "token" is an entire line. So, as I removed substitutions of "\n" for ";", our benchmarks had enormous lines that exceeded that limit. By adding a newline in the main repeated clause, we avoid the problem. (And, I inserted some code to check scanner.Err() in case we exceed it in the future, changing the error to "too long line" when it happens.)

}
currentLine := strings.Join(current, " ")
if strings.HasPrefix(currentLine, "#pragma") {
if opstring == "#pragma" {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may break some of the tests, but I think a pretty reasonable rule would be that # directives have to go on their own line. Any strong thoughts on that?

Copy link
Copy Markdown
Collaborator Author

@jannotti jannotti Aug 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule now is "If you have a #pragma, it consumes the rest of the line, no matter what" so as we make that true of all "# directives" that should handle most of the ideas I've thought of, where the expansion contains semi-colons.

I could see only allowing it at the start of lines though.

}
if strings.HasPrefix(opstring, "//") {
// semicolon inside comment is not counted as newline
ops.trace("%3d: comment\n", ops.sourceLine)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually an impossible case that came from when I had a different implementation (prior to what I PR'd), but I'll think about it a bit more


var spaces = [256]uint8{'\t': 1, ' ': 1}
// semi-colon is quite space-like, so include it
var spaces = [256]bool{'\t': true, ' ': true, ';': true}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change, but I will say it does mean that we lose the ability to make some weird macros like a macro beginning with a semicolon as we will now separate the semicolon and macro into two distinct tokens. I think that's fine but just something to keep track of

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth keeping in mind, but I can't think of an macro that would want to terminate the last "line" from the beginning.

Copy link
Copy Markdown
Collaborator Author

@jannotti jannotti Aug 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but it might not be as bad as you're thinking. Although ";" is its own token, since "# directives" will get the whole rest of the line, I think "#define x y; z" will still cause x to become "x ; z"

@iten-alg iten-alg merged commit b8c973d into iten-alg:semicolons Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants