Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocess and convert C1 controls to their 7 bit equivalent #7340

Merged
26 commits merged into from
Sep 16, 2020

Conversation

skyline75489
Copy link
Collaborator

@skyline75489 skyline75489 commented Aug 19, 2020

C1 control characters are now first converted to their 7 bit equivalent.
This allows us to unify the logic of C1 and C0 escape handling. This
also adds support for SOS/PM/APC string.

  • Unify the logic for C1 and C0 escape handling by converting C1 to C0
    beforehand. This adds support for various C1 characters, including
    IND(8/4), NEL(8/5), HTS(8/8), RI(8/13), SS2(8/14), SS3(8/15),
    OSC(9/13), etc.
  • Add support for SOS/PM/APC escape sequences. Fixes VT SOS, APC & PM should be ignored until supported #7032
  • Use "Variable Length String" logic to unify the string termination
    handling of OSC, DCS and SOS/PM/APC. This fixes an issue where OSC
    action is successfully dispatched even when terminated with non-ST
    character. Introduced by Add initial support for VT DCS sequences #6328, the DCS PassThrough is spared from
    this issue. This PR puts them together and add test cases for them.

References:
https://vt100.net/docs/vt510-rm/chapter4.html
https://vt100.net/emu/dec_ansi_parser

Closes #7032
Closes #7317

@ghost ghost added Area-VT Virtual Terminal sequence support Issue-Task It's a feature request, but it doesn't really need a major design. Priority-3 A description (P3) Product-Conhost For issues in the Console codebase labels Aug 19, 2020
{
return wch == L'\x90';
return wch - L'\x40';
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems correct but I can't find valid reference for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I believe this is correct. Quoting the DEC ST 070 manual:

expansion escape sequence (Fe) - 2-character escape sequence in which the final character is in columns 4 and 5, and is used to extend the 7-bit code table by being the row equivalent to the corresponding C1 set of controls in columns 8 and 9 of an 8-bit code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Should we prefer -0x40 or &~0x40 here?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I choose L'\x40' cuz I want it have the same type as wch. Not that it mean anything, but hey, no implicit conversion.

@zadjii-msft zadjii-msft self-assigned this Aug 19, 2020
// Arguments:
// - wch - Character that triggered the event
// Return Value:
// - <none>
void StateMachine::_EventDcsIgnore(const wchar_t wch) noexcept
void StateMachine::_EventDcsIgnore() noexcept
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s a little weird to have just one case in this, isn’t it. But this is actually the correct way to do this.

Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this looks good to me. I'd love to make sure we get @j4james to sign off on this too, if he's willing.

src/terminal/parser/stateMachine.cpp Outdated Show resolved Hide resolved
@zadjii-msft zadjii-msft removed their assignment Aug 26, 2020
Copy link
Collaborator

@j4james j4james left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I've run a few basic tests with it, and the C1 controls are now working as expected, and the APC/PM/SOS sequences are being appropriately ignored. I'm a little concerned about the input side of things, because I'm not how to test that, but I think the C1 mapping should still be OK for that.

@skyline75489
Copy link
Collaborator Author

Thanks @j4james . I literally couldn't write any of these PRs without you . You know originally I just want those dazzling Sixel images. But who knows VT sequences are actually full of fun. Be prepared for my following unknown numbers of PRs that's VT related 😄 .

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - I finally read over this, and I think I understand how this improves our state machine. I've got to trust Mike and James here, as well. Thanks @skyline75489!

@DHowett
Copy link
Member

DHowett commented Sep 16, 2020

@msftbot merge this in 1 minute

@ghost ghost added the AutoMerge Marked for automatic merge by the bot when requirements are met label Sep 16, 2020
@ghost
Copy link

ghost commented Sep 16, 2020

Hello @DHowett!

Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:

  • I won't merge this pull request until after the UTC date Wed, 16 Sep 2020 22:30:14 GMT, which is in 1 minute

If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you".

@ghost ghost merged commit f91b53d into microsoft:master Sep 16, 2020
ghost pushed a commit that referenced this pull request Sep 17, 2020
## Summary of the Pull Request

This fixes a typo in the `HyperlinkIdConsistency` unit test which was causing that test to fail. It was mistakenly using a `/` instead of `\` for the string terminator sequences.

## References

The test initially worked because of a bug in the state machine parser, but that bug was recently fixed in PR #7340.

## PR Checklist
* [x] Closes #7654
* [x] CLA signed. 
* [x] Tests passed
* [ ] Documentation updated. 
* [ ] Schema updated.
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan.

## Validation Steps Performed

I've run the test again and it now passes.
@skyline75489 skyline75489 deleted the chesterliu/dev/c1-to-c0 branch September 22, 2020 10:28
@ghost
Copy link

ghost commented Sep 22, 2020

🎉Windows Terminal Preview v1.4.2652.0 has been released which incorporates this pull request.:tada:

Handy links:

@DHowett
Copy link
Member

DHowett commented Oct 14, 2020

This fix was just released as part of Windows in Insider Build 20236!

skyline75489 referenced this pull request in yatli/console Oct 15, 2020
ghost pushed a commit that referenced this pull request Apr 30, 2021
This PR introduces a mechanism via which DCS data strings can be passed
through directly to the dispatch method that will be handling them, so
the data can be processed as it is received, rather than being buffered
in the state machine. This also simplifies the way string termination is
handled, so it now more closely matches the behaviour of the original
DEC terminals.

* Initial support for DCS sequences was introduced in PR #6328.
* Handling of DCS (and other) C1 controls was added in PR #7340.
* This is a prerequisite for Sixel (#448) and Soft Font (#9164) support.

The way this now works, a `DCS` sequence is dispatched as soon as the
final character of the `VTID` is received. Based on that ID, the
`OutputStateMachineEngine` should forward the call to the corresponding
dispatch method, and its the responsibility of that method to return an
appropriate handler function for the sequence.

From then on, the `StateMachine` will pass on all of the remaining bytes
in the data string to the handler function. When a data string is
terminated (with `CAN`, `SUB`, or `ESC`), the `StateMachine` will pass
on one final `ESC` character to let the handler know that the sequence
is finished. The handler can also end a sequence prematurely by
returning false, and then all remaining data bytes will be ignored.

Note that once a `DCS` sequence has been dispatched, it's not possible
to abort the data string. Both `CAN` and `SUB` are considered valid
forms of termination, and an `ESC` doesn't necessarily have to be
followed by a `\` for the string terminator. This is because the data
string is typically processed as it's received. For example, when
outputting a Sixel image, you wouldn't erase the parts that had already
been displayed if the data string is terminated early.

With this new way of handling the string termination, I was also able to
simplify some of the `StateMachine` processing, and get rid of a few
states that are no longer necessary. These changes don't apply to the
`OSC` sequences, though, since we're more likely to want to match the
XTerm behavior for those cases (which requires a valid `ST` control for
the sequence to be accepted).

## Validation Steps Performed

For the unit tests, I've had to make a few changes to some of the
`OutputEngineTests` to account for the updated `StateMachine`
processing. I've also added a new `StateMachineTest` to confirm that the
data strings are correctly passed through to the string handler under
all forms of termination.

To test whether the framework is actually usable, I've been working on
DRCS Soft Font support branched off of this PR, and haven't encountered
any problems. To test the throughput speed, I also hacked together a
basic Sixel parser, and that seemed to perform reasonably well.

Closes #7316
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-VT Virtual Terminal sequence support AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Task It's a feature request, but it doesn't really need a major design. Priority-3 A description (P3) Product-Conhost For issues in the Console codebase
Projects
None yet
4 participants