Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conhost hangs when a lot of content is output in a single write #11794

Open
j4james opened this issue Nov 20, 2021 · 22 comments
Open

Conhost hangs when a lot of content is output in a single write #11794

j4james opened this issue Nov 20, 2021 · 22 comments
Labels
Area-Performance Performance-related issue Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Conhost For issues in the Console codebase
Milestone

Comments

@j4james
Copy link
Collaborator

j4james commented Nov 20, 2021

Windows Terminal version

n/a

Windows build number

10.0.19041.1348

Other Software

No response

Steps to reproduce

  1. Open a cmd shell in conhost.
  2. Run the C++ example further down in the comments: link.

The Python example below does not reproduce the issue reliably on all systems.

1. Open a WSL shell in conhost. 2. Execute the Python script below.
import math
import os
import sys

size = os.popen('stty size', 'r').read().split()
h,w = tuple(int(n) for n in size)
mx = w//2
my = h//2 

def frame(i):
  s = '\033[H'
  for y in range(h):
    for x in range(w):
      dy,dx = y-my,x-mx
      a = math.atan2(dy*2,dx) + math.pi
      c = (int(a/math.pi*127)+i)%256
      s += '\033[38;2;%d;%d;%dm*' % (c,c,c)
  return s

sys.stdout.write('Generating content...\n')
s = '\033[?25l'
for i in range(512):
  s += frame(i)
s += '\033[?25h\033[m\033[H'

sys.stdout.write('Starting animation...\n')
sys.stdout.write(s)

This constructs a little VT animation which it outputs with a single write call.

Expected Behavior

You should see a rotating pattern, a bit like a radar scan. This is what it looks like in MinTTY (using wsltty):

Rotate.mp4

You'll see the same sort of thing in XTerm, VTE, Alacritty, Kitty, etc.

Actual Behavior

Conhost displays the first couple of lines of the first frame (if anything), then hangs for a couple of seconds, and eventually just shows the final frame.

Windows Terminal is a little better, in that it doesn't hang, but you still don't see the animation - just the final frame.

I believe the problem is that the buffer is locked for the entire time that the VT parser is processing a write operation, and the render thread can't render anything when it can't obtain the lock. If the write buffer is particularly large (as is the case here), then the renderer may be blocked for quite a long time.

@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Nov 20, 2021
@j4james
Copy link
Collaborator Author

j4james commented Nov 20, 2021

This may seem like a contrived example, and it is, but it's based on a real issue I had when testing my sixel branch with the sixel-bench benchmark. Much like the above animation, it caused the console to hang for a couple of seconds, and eventually just displayed the final frame of the video.

I suppose sixel-bench is really a contrived example too, so not an urgent problem (especially since we don't have sixel yet), but this definitely does feel like a bug - I don't think it should be possible to hang the console like that.

@elsaco
Copy link

elsaco commented Nov 21, 2021

@j4james this issue could be Windows version specific. You're using Windows 10. On version 10.0.2200 (Windows 11) after setting Windows Console Host as the default terminal application in SUI, and WT restart, this is the output:

conhost_rotate_test.mp4

WT version 1.11.2921.0. Same result when WT is the default terminal application. This post is for reference only!

@j4james
Copy link
Collaborator Author

j4james commented Nov 21, 2021

Huh. That's weird. I wouldn't expect the Windows version to have any effect, but I wonder if it has something to do with the version of WSL you're using (I'm on v1). This test relies on Python passing the write call through to conhost as a single output operation. I've tested both Python 2 and 3, so the Python version shouldn't be a factor, but it's possible the WSL version may affect things.

I'll try and rewrite the test case in C, with a direct call to WriteConsole, which should guarantee the buffer won't be split up by some intermediate code.

Thanks for attempting to replicate the issue though.

@j4james
Copy link
Collaborator Author

j4james commented Nov 21, 2021

Here's the equivalent test case rewritten in C++ with a WriteConsole call. But it'll now need to be run from a Windows shell like cmd or PowerShell.

#include <windows.h>
#include <string>
#include <cmath>

void main()
{
  HANDLE handle = GetStdHandle(STD_OUTPUT_HANDLE);

  DWORD mode;
  GetConsoleMode(handle, &mode);
  mode |= ENABLE_VIRTUAL_TERMINAL_PROCESSING;
  SetConsoleMode(handle, mode);

  CONSOLE_SCREEN_BUFFER_INFOEX csbie;
  csbie.cbSize = sizeof(csbie);
  GetConsoleScreenBufferInfoEx(handle, &csbie);

  const auto w = csbie.dwSize.X;
  const auto h = csbie.srWindow.Bottom - csbie.srWindow.Top + 1;
  const auto mx = w/2;
  const auto my = h/2;
  const auto pi = 3.14159265358979323846;

  printf("Generating content...\n");
  std::string s = "\033[?25l";
  for (auto i = 0; i < 512; i++) {
    s += "\033[H";
    for (auto y = 0; y < h; y++) {
      for (auto x = 0; x < w; x++) {
        auto dy = y-my;
        auto dx = x-mx;
        auto a = std::atan2(dy*2,dx) + pi;
        auto c = std::to_string((int(a/pi*127)+i)%256);
        s += "\033[38;2;"+c+";"+c+";"+c+"m*";
      }
    }
  }
  s += "\033[?25h\033[m\033[H";

  printf("Starting animation...\n");
  DWORD written = 0;
  WriteConsoleA(handle, s.c_str(), s.length(), &written, NULL);
}

@elsaco
Copy link

elsaco commented Nov 21, 2021

@j4james this is the output with the exe version of rotate in 1st tab, and python/wsl in 2nd tab:

rotate_test.mp4

Changing default terminal application setting did not make any difference in any tab.

@elsaco
Copy link

elsaco commented Nov 21, 2021

@j4james when running the python script in pwsh, it works but rendering seems slower compared to wsl test:

wt_rotate_pwsh.mp4

The same size = ['30', '120'] was used as in the wsl script.

@j4james
Copy link
Collaborator Author

j4james commented Nov 21, 2021

I should have mentioned I didn't expect to see the problem on the Windows version of Python, because I know it handles I/O differently to Linux, and the output gets split into chunks. But judging from your results, it looks like the same thing can apply to Linux.

At least the C++ version confirms the problem, although your system is clearly a lot faster than mine. so you're not seeing much of a delay before it renders the final frame.

@elsaco
Copy link

elsaco commented Nov 22, 2021

@j4james what is the difference between printf and writeconsole? This is what I've changed after line 38:

  printf("Starting animation...\n");
  printf("%s", s.c_str());

and this is the output:

wt_printf_rotate.mp4

@zadjii-msft
Copy link
Member

You know, I think @lhecker actually fixed this recently, but I'd assume the fix isn't in 10.0.19041.1348

@zadjii-msft
Copy link
Member

gh-11794

@lhecker
Copy link
Member

lhecker commented Nov 22, 2021

Yeah I had regular freezes in WT of up to 30 seconds, when running the application at high frame rates.

Since it was a heisenbug and only occurred when you weren't observing it (with a debugger) I could never actually proof the reason for the issue. But I believe I could determine it regardless: Locks like SRWLOCK are unfair locks and under certain circumstances it can prevent threads from progressing for a long time.

Based on this theory I changed the buffer lock in WT into a til::ticket_lock and the issue hasn't occurred for me since.
Integrating til::ticket_lock into conhost could be slightly more difficult as it uses critical sections and might rely on its support for recursive locking.

@j4james
Copy link
Collaborator Author

j4james commented Nov 22, 2021

You need to test with the C++ example - the Python version doesn't seem to work for everyone (I'm guessing because you're all on a later version of WSL to me). I can reproduce this with a recent version of conhost built from source and a recent version of Windows Terminal built from source.

@j4james
Copy link
Collaborator Author

j4james commented Nov 22, 2021

what is the difference between printf and writeconsole?

@elsaco printf is going to build up the output string in a temporary buffer, which it will periodically flush when full. That results in a bunch of small WriteConsole calls instead of one big one, which defeats the purpose of the test.

@j4james
Copy link
Collaborator Author

j4james commented Nov 22, 2021

Btw, this is what it looks like for me in OpenConsole and Windows Terminal, both built from commit a68b0d4.

Rotate2.mp4

@elsaco
Copy link

elsaco commented Nov 23, 2021

@j4james if the same rotate.exe is run from a WSL prompt the delay is clearly visible. Also, please notice how the last frame looks. There's about 4-5 sec between Starting animation... and end of test:

wt_wsl_rotate.mp4

@zadjii-msft zadjii-msft added Area-Performance Performance-related issue Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-1 A description (P1) Product-Conhost For issues in the Console codebase labels Nov 23, 2021
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Nov 23, 2021
@zadjii-msft zadjii-msft added this to the Terminal v2.0 milestone Nov 23, 2021
@lhecker
Copy link
Member

lhecker commented Nov 28, 2021

After moving LockConsole / UnlockConsole into WriteConsoleWImplHelper and limiting the chunk size passed to DoWriteConsole to 2kB:

98aQbqsgwKpEUvcN.mp4

After additionally replacing the _csConsoleLock critical section with:

static thread_local ULONG recursionCount = 0;
static til::ticket_lock lock;

...and increasing the chunk size to a more reasonable 128kB (since the ticket lock is now being "fair"):

aMg2dnhJAHLmWdAj.mp4

Unfortunately however, chunking the input is likely not a viable option, as some applications rely on the implicit promise that a single write() with VT is processed synchronously. For instance:

GPABiTKwPytBxTNM.mp4

@j4james
Copy link
Collaborator Author

j4james commented Nov 28, 2021

Unfortunately however, chunking the input is likely not a viable option, as some applications rely on the implicit promise that a single write() with VT is processed synchronously.

That's a good point. So maybe this is just the caller's responsibility then, and the issue could be closed as "won't-fix". My main concern was the failure of sixel-bench, but it looks like I'm the only own that can reproduce the issue in Python (which is what sixel-bench is using), so it's likely that could just be resolved by an update of Windows or WSL.

@lhecker
Copy link
Member

lhecker commented Nov 28, 2021

@j4james But at least one improvement will result out of this: As you can see in my video above, even 2kB chunks of data can cause the output to lag in conhost (and ConPTY) if an application is writing VT sufficiently fast. By switching to a fair mutex (til::ticket_lock) we can improve the user experience. I'm going to submit a PR for that soon. 🙂
The performance hit of that seems to be less than 1.5% in my benchmarks.

@j4james
Copy link
Collaborator Author

j4james commented Nov 28, 2021

By switching to a fair mutex (til::ticket_lock)

Just be aware that there might be a few places in conhost where the lock is acquired recursively (I'm fairly certain I've come across some in the past). I think I remember you saying that the ticket_lock wasn't re-entrant, which I assume would be problematic.

I don't think it would be that difficult to refactor the code to avoid recursive locks, but it might be a bit of work trying to track down all the cases that need to be dealt with.

@lhecker
Copy link
Member

lhecker commented Nov 28, 2021

I don't believe it's all too difficult to make a mutex save for reentrancy. My approach is basically:

static thread_local ULONG recursionCount = 0;
static til::ticket_lock lock;

void LockConsole() {
    const auto rc = ++recursionCount;
    if (rc == 1) {
        lock.lock();
    }
}

void UnlockConsole()
{
    const auto rc = --recursionCount;
    if (rc == 0) {
        lock.unlock();
    }
}

It's missing safety-checks for over- and underflow, but otherwise it works fine.

@zadjii-msft zadjii-msft removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Dec 2, 2021
@ghost ghost added In-PR This issue has a related PR and removed In-PR This issue has a related PR labels Dec 7, 2021
@zadjii-msft zadjii-msft removed the Priority-1 A description (P1) label Jan 4, 2022
@zadjii-msft zadjii-msft modified the milestones: Terminal v2.0, 22H2 Jan 4, 2022
DHowett pushed a commit that referenced this issue Jan 14, 2022
This commit replaces the console lock was replaced with a fair ticket
lock implementation, as only fair locks guarantee us that the renderer
(or other parts) can acquire the lock, once the VT parser has finally
released it.

This is related to #11794.

## Validation Steps Performed
* No obvious crashes ✅
* No text/VT performance regression ✅
* Cursor blinker starts/stops on focus/defocus of the window ✅
@zadjii-msft zadjii-msft modified the milestones: 22H2, Backlog Jul 5, 2023
@lhecker
Copy link
Member

lhecker commented Aug 17, 2024

I just randomly came across this issue while searching for sixels of all things.
I think we can close this now, in particular with the new passthrough support. BTW the animation from the C++ snippet now works perfect (again?)!

@lhecker lhecker closed this as completed Aug 17, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs-Tag-Fix Doesn't match tag requirements label Aug 17, 2024
@j4james
Copy link
Collaborator Author

j4james commented Aug 17, 2024

@lhecker This still hangs for me in conhost, which it what the issue was originally about. And while it's improved in Windows Terminal, it still has a significant delay before you see anything (assumedly because conhost has to process the buffer first). This is most obvious when running the sixel-bench test.

@lhecker lhecker reopened this Aug 17, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot removed the Needs-Tag-Fix Doesn't match tag requirements label Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Performance Performance-related issue Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Conhost For issues in the Console codebase
Projects
None yet
Development

No branches or pull requests

4 participants