Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Magic 19 number? #39

Open
Shakeskeyboarde opened this issue May 20, 2024 · 12 comments
Open

Magic 19 number? #39

Shakeskeyboarde opened this issue May 20, 2024 · 12 comments

Comments

@Shakeskeyboarde
Copy link

Shakeskeyboarde commented May 20, 2024

When an ESC/CSI is found, a character is parsed in the next 19 bytes utf-16 code units:

string = string.slice(offset, offset + 19);

Why 19?

@sindresorhus
Copy link
Member

I assume it's the max ansi escape length.

400a6ca

@sindresorhus
Copy link
Member

But maybe @AlCalzone can clarify. It was added in 29f76a4.

@AlCalzone
Copy link
Contributor

AlCalzone commented May 21, 2024

It's supposed to be the max. escape sequence length, essentially ESC[38;2;{r};{g};{b}m - ESC is one char, r, g, b can expand to 3 each.

@Shakeskeyboarde
Copy link
Author

Shakeskeyboarde commented May 21, 2024

Thanks for clarifying. However, this means that if somehow the code doesn't end in an m, it just captures 19 code units and marks them as an ansi code? Even though it doesn't match the code pattern that the 19 value is derived from? Shouldn't it be considered not a code?

@AlCalzone
Copy link
Contributor

Hmm I think you have a point there.

@Qix-
Copy link
Member

Qix- commented May 21, 2024

let endIndex = string.indexOf('m', startIndex);

Reading the entire function is important for context. This function bails if the maximum length doesn't have an m, which in all implementations will bail the control code even if it's malformed.

I don't see an issue with the code as-is. If you think there's an issue, I'd love to see a repro.

@AlCalzone
Copy link
Contributor

AlCalzone commented May 21, 2024

Technically it consumes the entire string instead of bailing, which is probably not what @Shakeskeyboarde expects. And I think ignoring the potential escape sequence at the current location might make more sense.

@Qix-
Copy link
Member

Qix- commented May 21, 2024

I'm not sure what you mean. It consumes the whole string (by reference...) and iterates up to 19 times over it when looking to parse the sequence.

What issue are you really getting at?

@AlCalzone
Copy link
Contributor

index += code.length;

In the case mentioned above, parseAnsiCode returns a 19-character substring of the input. As a result, code is truthy and the next iteration starts 19 characters later. This means if there is an actual ansi code in the 18 chars following the escape, it will be skipped.

@Qix-
Copy link
Member

Qix- commented May 22, 2024

Ah okay. I see the point now.

This is also very wrong. there's no max length. \x1b[0;0;0;0;0;0;0;0;0;0;0;0m is a valid render mode escape. So is \x1b[0;1;38;2;123;123;123;48;2;456;456;456m, etc.

This shouldn't be constraining itself to an arbitrary length.

@Shakeskeyboarde
Copy link
Author

Shakeskeyboarde commented May 22, 2024

Ah okay. I see the point now.

This is also very wrong. there's no max length. \x1b[0;0;0;0;0;0;0;0;0;0;0;0m is a valid render mode escape. So is \x1b[0;1;38;2;123;123;123;48;2;456;456;456m, etc.

This shouldn't be constraining itself to an arbitrary length.

A better solution would be to walk string until a terminator is reached, or an invalid character is encountered. So, match the CSI, then a number, then ; + number repeating, until m. If an invalid character is encountered, either ignore the whole thing, continuing from the CSI+1 character, or throw out everything up to the invalid character, assuming the code is incomplete.

That brings up another point. This doesn't seem to match \x1b[m, which is a valid reset code, because it never finds the number it's expecting.

@Qix-
Copy link
Member

Qix- commented May 22, 2024

That's also correct, yeah. Worth mentioning that EOF should be considered as "invalid" when trying to parse a code and whatever came before it, after and including the CSI, should still be considered an escape. It's the closest you can get to how emulators would parse it, given that they're done using state machines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants