Skip to content
This repository has been archived by the owner on Mar 5, 2022. It is now read-only.

Smarter colorization and better support for native terminals on Windows #277

Merged
merged 2 commits into from
Apr 10, 2019

Conversation

zmwangx
Copy link
Collaborator

@zmwangx zmwangx commented Apr 10, 2019

This is really two related PRs crammed into one so as to not create any branch dependency.

Let me paste in the commit messages first:

Introduce smarter coloring and --colorize option

By default, do not colorize when stdout is not a tty. The --colorize option,
with possible values 'auto', 'always', and 'never' -- defaults to 'auto' when
not specified, and registers as 'always' when specified without a value -- is
similar to the --color option found in many well known *nix tools, with
coreutils ls(1) being a notable example. (Our option is named --colorize since
we already have a --colors option.) The old -C, --nocolor option is now a
shortcut for --colorize=never.

Limitation: due to limitations of Python argparse, specifically not being able
to make the equal sign between a long option and its value mandatory, one might
expect the following to work:

googler --colorize google

but it does not, due to "google" being parsed as an argument to --colorize. One
has to write

googler --colorize -- google

instead.

Note: Initially I considered "smartly" enabling/disabling coloring on Windows consoles, so it is only fair to pick the cross-platform low-hanging fruit in terms of smart colorization, i.e., tty detection, first. Upon further consideration I rejected the previous decision, but this change was already made, and it seems okay to leave it as is.

Enable ANSI color in cmd and PowerShell on Windows 10

VT100 control sequences are supported in cmd and PowerShell starting from
Windows 10 Anniversary Update, but only if the
ENABLE_VIRTUAL_TERMINAL_PROCESSING flag is set on the screen buffer handle
using SetConsoleMode.

Setting this flag does not seem to negatively affect third party terminal
emulators with native ANSI support such as ConEmu, so hopefully there's no
regression.

References:
https://docs.microsoft.com/en-us/windows/console/console-virtual-terminal-sequences
https://docs.microsoft.com/en-us/windows/console/setconsolemode

Credits:
#275
tartley/colorama#139

Further explanations on some decisions

Not included in the commit messages, but worth mentioning:

  • I tested this on latest Windows 10 (Pro 1809, 17763.379) native cmd.exe, PowerShell.exe and latest ConEmu preview build. I'll just assume it works in other third party terminal emulators too. I won't bother to test this on Windows 7. And no one uses Windows 8 anyway.

  • The key difference between this and Enable/Disable ANSI Sequences on Windows. #275 is that we do not disable color on win32 just because it's not Windows 10 or because the ctypes code failed for whatever reason. Sure, you might still see garbage by default when using googler on Windows 7 or pre-1607 Windows 10 native cmd/PowerShell, but people using an ANSI-capable terminal emulator on those systems aren't suddenly punished and have their color stripped. Progressive enhancement fixing part of the issue is better than aiming for correctness at the cost of introducing regressions.

    By the way, I do hope poor souls stuck on Windows at least help themselves with a more modern terminal emulator, instead of MS crap that values compatibility with DOS more than usability in 2019.

  • Speaking of fixing part of the issue: what about Windows 7 or pre-1607 Windows 10? I'm not even sure pre-1607 Windows 10 exists at this point, but Windows 7 native console certainly isn't targeted by this fix. I looked at how colorama does this, and it's ugly as hell: they have to hijack the streams, extract the control sequences and translate them into win32 calls between writes. Not surprisingly, it's not Unix, so text is not the universal interface, erratic win32 calls are.

    Since googler prides itself on being a single script without dependencies, we have to cut corners here instead of implementing a full colorama-type solution here -- we certainly don't want to introduce and maintain 500 loc for 0.01% of users.

By default, do not colorize when stdout is not a tty. The --colorize option,
with possible values 'auto', 'always', and 'never' -- defaults to 'auto' when
not specified, and registers as 'always' when specified without a value -- is
similar to the --color option found in many well known *nix tools, with
coreutils ls(1) being a notable example. (Our option is named --colorize since
we already have a --colors option.) The old -C, --nocolor option is now a
shortcut for --colorize=never.

Limitation: due to limitations of Python argparse, specifically not being able
to make the equal sign between a long option and its value mandatory, one might
expect the following to work:

    googler --colorize google

but it does not, due to "google" being parsed as an argument to --colorize. One
has to write

    googler --colorize -- google

instead.
@zmwangx
Copy link
Collaborator Author

zmwangx commented Apr 10, 2019

By the way, sorry about the delay. I didn't do this last night because (1) you never know how much time you'll spend on learning/tinking with Windows stuff (turns out I spent a fair bit of time, even though I have a reasonable level of experience with using ctypes on Linux); (2) I need a clear head to make non-technical UX decisions.

VT100 control sequences are supported in cmd and PowerShell starting from
Windows 10 Anniversary Update, but only if the
ENABLE_VIRTUAL_TERMINAL_PROCESSING flag is set on the screen buffer handle
using SetConsoleMode.

Setting this flag does not seem to negatively affect third party terminal
emulators with native ANSI support such as ConEmu, so hopefully there's no
regression.

References:
https://docs.microsoft.com/en-us/windows/console/console-virtual-terminal-sequences
https://docs.microsoft.com/en-us/windows/console/setconsolemode

Credits:
jarun#275
tartley/colorama#139
@zmwangx
Copy link
Collaborator Author

zmwangx commented Apr 10, 2019

I noticed PR building isn't turned on for CircleCI. I turned it on at https://circleci.com/gh/jarun/googler/edit#advanced-settings, but I don't want to enable "Pass secrets to builds from forked pull requests", since as they say a malicious PR could otherwise reveal secrets... We should really move NUM_TEST_ITERATIONS and SLEEP_DURATION to config.yml instead. (While we're at this, I realize I promised a Travis purge, so this will be part of that. It will have to wait until later tonight or tomorrow though.)

@jarun jarun merged commit 5fd59b3 into jarun:master Apr 10, 2019
@jarun
Copy link
Owner

jarun commented Apr 10, 2019

Thank you!

@zmwangx
Copy link
Collaborator Author

zmwangx commented Apr 10, 2019

@guilt Your request is now implemented with some changes, thanks.

@jarun
Copy link
Owner

jarun commented Apr 10, 2019

I will port it to ddgr.

@zmwangx
Copy link
Collaborator Author

zmwangx commented Apr 10, 2019

My decisions should be adequately explained above. One thing I forgot to mention: I dropped the anniversary update version check because this sort of string comparison is not robust at all. If SetConsoleMode doesn’t work, so be it, we just fallback to default behavior.

@jarun
Copy link
Owner

jarun commented Apr 10, 2019

We may want to depreciate -C at some point down the line OR redefine it to --colorize.

jarun added a commit to jarun/ddgr that referenced this pull request Apr 10, 2019
@zmwangx
Copy link
Collaborator Author

zmwangx commented Apr 10, 2019

—nocolor and —colorize without argument are.complete opposites, so it might.not be a great idea to rebrand -C. The point of —colorize=auto is that it should do the right thing 99% of the time, so you basically never need —colorize. We are basically reducing most -C use cases (piping) to the default. For the remaining 1%, —nocolor as a shortcut should still be the prevalent use case.

@jarun
Copy link
Owner

jarun commented Apr 10, 2019

Yes, I thought in the same line first; but it just felt like too many options for handling colors. Anyway, we are good I guess.

@zmwangx
Copy link
Collaborator Author

zmwangx commented Apr 10, 2019

curl has loads of options too, with many groups of options doing roughly the same things with very slight variations or some options being very specific shortcuts for others. It's of course nice to keep everything pure and simple, but for mere mortals practicing iterative designs, not unnecessarily breaking existing usage is probably more important than purity :)

@jarun
Copy link
Owner

jarun commented Apr 10, 2019

;)

@guilt
Copy link

guilt commented Apr 10, 2019

@zmwangx

Thank you. Also thanks for using the byref correctly. If you wanted to use this for the stderr handle, where printerrwrites some ANSI text, use -12.

try:
from ctypes import windll, wintypes, byref
kernel32 = windll.kernel32
stdout_handle = kernel32.GetStdHandle(STD_OUTPUT_HANDLE)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add STD_ERROR_HANDLE = -12 and repeat this process. LGTM.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this and it turns out setting the mode flag for stdout makes stderr VT100-aware too. I know little about Windows, I guess it's because the two streams are sharing the same tty device (lacking Windows-specific terminology)? Like on *nix, tcsetattr(3) and friends from termios apply to the tty device and are not specific to any stream.

I can't think of case where someone would split stdout and stderr on two ttys, so we're probably good here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The messages sent to stderr through printerr will contain bad sequences, just like the original problem was with stdout. I guess it should not make a difference on non-Windows, but it sure makes it usable on Windows.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zmwangx and @guilt, do you guys understand that all these low-level hacks we are pulling out of our sleeves is because of the fact that we are trying to support a terminal that doesn't understand escape sequences in 2019?

We had an option to disable colors completely and there are terminal emulators on Windows which understand color codes.

Here's my take - we have spent quality time on this one. We are still discussing what we need to do to make it work on a terminal the source of which is not available to us.

Can we revert this patch and take any of the following paths:

  • document clearly that people using native terminals on Windows should be using the -C option?
  • if that looks inadequate, if we detect the platform is Windows we show a constant message at the beginning to tell the user that he may have to use the option -C if he is seeing ANSI escape codes
  • set -C to true by default on Windows as the probability is high the users may be using those terminals which are provided to them. And we document that to have colors on Windows one has to set C to false.

Any of these would be much more elegant than stuff we are wasting our time on now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guilt

The messages sent to stderr through printerr will contain bad sequences, just like the original problem was with stdout.

Yeah, I’m saying after setting ENABLE_VIRTUAL_TERMINAL_PROCESSING on stdout, stderr understands ANSI color too, at least on my machine. The prompt use reverse video which works alright in both cmd and PowerShell. Do you have a problem there? (I’m just using tcsetattr, which I understand well, as a model to guess why it works. Both fd 1 and fd 2 eventually points to /dev/pts/something which tsetattr works upon, and I assume it works similarly on Windows.)

Can we revert this patch and take any of the following paths...

I made sure this patch is strictly improvements without breaking anything that was working previously, so I don’t see why it needs be reverted. No color by default or always on warning are too heavy-handed and I myself as a ConEmu/Cmder user (when I’m using Windows, that is) won’t appreciate — color was working fine the first time I ported googler to Windows, that was what, two to three years ago? Adding a note to README is fine, it’s already full of things like this.

Also, consider this: Windows 7 EOL is near, Windows 8 is universally hated and admitting to using it in public is a no no ;), Windows 10 auto updates so probably no one is using a pre-1607 version, so this patch really covers almost the entire supported base of Windows users. The rest are like Python 3.3 users and who cares.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, on second thought I’ll set the flag for stderr too since it doesn’t really hurt... At most we set 7 | 4 = 7 which is a noop.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see what you mean.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually 7= ENABLE_PROCESSED_OUTPUT | ENABLE_WRAP_AT_EOL_OUTPUT | ENABLE_VIRTUAL_TERMINAL_PROCESSING

The reason I suggested stderr is because of lines like:

googler/googler

Line 2103 in 1d3f70d

printerr('\x1b[1mInspect the DOM through the \x1b[4mtree\x1b[24m variable.\x1b[0m')

I'm glad you folks are taking this forward. Really appreciate it :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENABLE_PROCESSED_OUTPUT and ENABLE_WRAP_AT_EOL_OUTPUT are the default, so when you get stdout console mode you get 3, you set it to 3 | 4 = 7; then when you get stderr console mode, which is the same console, you get 7, so setting it to 7 | 4 = 7 is a noop. Again, it doesn't hurt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#280.

zmwangx added a commit to zmwangx/googler that referenced this pull request Apr 11, 2019
Follow-up on jarun#277.

As discussed here[1], it's not really necessary when stdout and stderr are the
same console, but it doesn't hurt either.

Note that we did not test sys.stderr.isatty() in the code. This has been
considered, and deemed not a problem, since stdout is handled first (what
really matters) and we have a catch-all pass for any possible exception.

[1] jarun#277 (comment)
@zmwangx zmwangx deleted the smart-color branch April 12, 2019 15:56
@lock lock bot locked and limited conversation to collaborators Nov 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants