Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnnotatedStrings, and a string styling stdlib #49586

Merged
merged 17 commits into from
Oct 20, 2023

Conversation

tecosaur
Copy link
Contributor

@tecosaur tecosaur commented May 1, 2023

Styled content is hard to deal with. This contribution aims to drastically improve that.

Closes #41791, closes #41435, and I think also resolves #40228 and resolves #28690.

New public API

Within Base:

  • AnnotatedString
  • AnnotatedChar
  • annotatedstring

From the stdlib:

  • Base.Face
  • addface!
  • @S_str

Motivation

Julia already treats the REPL experience better than most other languages out there (🐍), and I think it's great how often I see the REPL as a selling point of Julia.

However... styling is hard. Across Base and the package ecosystem we have a painful mixture of raw ANSI codes and an unsightly pile of printstyled lines. This isn't just ugly, this leads to reoccurring issues with incorrectly computed string lengths/padding and unterminated styling codes (e.g. #37568 and #45521).

We can do better than this. Rethinking our approach to styled content allows us to do away with this class of bugs/headaches altogether, and gain a number of shiny new features at the same time.

Hy hope is that the capabilities introduced by this PR will lead to a much more robust approach to string styling, and by making it easier to produce well-styled content even more beautiful REPL experiences across the board.

Rundown of changes

An StyledString type is introduced that effectively sits on top of other string types. It wraps another AbstractString but then separately stores a list of attributes applied to particular regions of the wrapped string. Regions are represented by a UnitRange{Int}, and attributes as a Pair{Symbol, Any}.

This effectively creates "content" and "attributes" layers, allowing both to be handled much better (conveniently and robustly) than when mixed together. Attributes can be Anything, which allows for more than just styling information, e.g. hyperlinks, or other data which makes sense to attach to a region of a string like source location information. Really StyledString should be called PropertizedString, but I think that's much less catchy 😛.

An StyledChar type is also added, which is a mirror of AbstractString, just wrapping an AbstractChar instead.

To handle styling information, a new type is added to contain all possibly relevant styling information — Face. This is very much inspired by https://www.gnu.org/software/emacs/manual/html_node/elisp/Faces.html, as I have found this system to work rather well in practice (why invent a new approach when there's a lovely battle-tested one we can be inspired by? 😉).

The main way faces work is by having a global Dics{Symbol, Face} of face names. This has a number of advantages:

  • It makes it easy to reference faces by their name
  • Packages can easily add new faces
  • It makes it possible for faces to inherit from other faces (e.g. have a :julia_help_prompt face that inherits from :julia_prompt)

Conversion functions to accept and load face specifications from a Dict{String, Any} are also added. By loading ~/.julia/config/faces.toml this resolves #41435.

To handle printing faces well, we need to inquire about the terminal a fair bit more, and so I've replaced tcap with a terminfo parser implemented in accordance with term(5).

Lastly, we have a string macro (@S_str) that was rather nasty to write, but is very convenient.

Currently unimplemented

  • replace
  • property removal

Usage examples

The example from the @S_str docs

image

Easy nested styling

image

No more issues with unterminated ANSI codes

image

String functions operate on the unstyled content, while preserving styling where possible

image

image

Styled text will actually automatically fall back to the un-styled form when printed to an io with :color != true.

image

Incrementally styling a string, and changing face definitions on-the-fly (note: this uses currently "private API" functions)

image

Fancier usage examples

The result of the example "banner using S"..."" commit

image

A fancier ^R implementation, that's been implemented using StyledStrings and JuliaSyntax that would be a massive pain to do via printstyled et al.

image

With very little work, docstring previews could look like this

image

@tecosaur tecosaur marked this pull request as draft May 1, 2023 18:37
@longemen3000
Copy link
Contributor

seems similar to Crayons.jl ? https://github.com/KristofferC/Crayons.jl

@tecosaur
Copy link
Contributor Author

tecosaur commented May 2, 2023

seems similar to Crayons.jl?

Only in the sense that both Crayons.jl and this PR provide a more thorough approach to ANSI styling than printstyled. However:

  • Crayons-marked content is not an AbstractString, and so does not work with string functions like lpad etc.
  • Crayons does not resolve either the "truncation of styled content leaving unterminated ANSI codes" issue or the "ANSI sequences mess with string calculations" issue
  • Most of the "Usage examples" are not possible with Crayons
  • Crayons does not adjust printing based on the terminal capabilities (e.g. support for italics, truecolor)
  • Crayons only supports some styles, not arbitrary attributes
  • Crayons is built around ANSI-code styling, this is more general
  • Crayons do not support stylistic inheritance, package-extended styles, or user-customisation of styles
  • Crayons is not part of Base, and so cannot be used to improve how styled content is handled in Base (and I'd argue doesn't go far enough to warrant inclusion … while I think this PR does)

I'd encourage you to take a closer look at this contribution.

Copy link
Member

@KristofferC KristofferC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a set of basic comments that popped up from an initial reading of the code.

base/regex.jl Show resolved Hide resolved
base/regex.jl Outdated Show resolved Hide resolved
base/strings/faces.jl Outdated Show resolved Hide resolved
base/strings/faces.jl Outdated Show resolved Hide resolved
base/strings/faces.jl Outdated Show resolved Hide resolved
base/strings/faces.jl Outdated Show resolved Hide resolved
base/strings/styled.jl Outdated Show resolved Hide resolved
base/strings/styled.jl Outdated Show resolved Hide resolved
base/strings/styled.jl Outdated Show resolved Hide resolved
base/strings/util.jl Outdated Show resolved Hide resolved
@bicycle1885
Copy link
Member

It would be very nice if we could somehow unify this stylized strings and my format strings (https://github.com/bicycle1885/Fmt.jl) in Base.

@tecosaur
Copy link
Contributor Author

tecosaur commented May 4, 2023

It would be very nice if we could somehow unify this stylized strings and my format strings (https://github.com/bicycle1885/Fmt.jl) in Base.

That's an interesting idea. String formatting and styling at once does seem like it could be nice, I imagine something like S"{warn:$(sqrt(2)):.2f}" could be done. However, I don't want to let perfect be the enemy of the good, so I'm wary of jumping on tying these two things together and making a harder to review/merge PR.

Perhaps we could leave a 'hole' in the S"..." syntax which would allow for formatting to be supported down the road?

@stevengj
Copy link
Member

Packages can easily add new faces

Not composably, though? Having a mutable global namespace for this seems like it invites conflicts.

@tecosaur
Copy link
Contributor Author

Packages can easily add new faces

Not composably, though?

Can you elaborate on what you mean by "composibly"? We have inheritance, e.g. mypackage_repl_mode_prompt can inherit from julia_prompt and mypackage_main. In my mind (and from my experience with this system over the past few years), that should be sufficient.

Having a mutable global namespace for this seems like it invites conflicts.

A package could use its own face dictionary and withfaces, however this would fragment customisation, and easy (central) user customisation is another goal of this implementation. Currently, all face definitions can be modified by a user's faces.toml, which I think is highly desirable.

We can encourage package authors to use the face format name pkgname_facename, and so solve this potential issue via convention. Of course, there can be bad actors, but that's true of virtually any system we can dream up. For what it's worth, Emacs has had this system for at least 30 years, and I have yet to hear of anybody having any issues with face-name conflicts, and that's just thanks to this convention.

@tecosaur
Copy link
Contributor Author

tecosaur commented May 16, 2023

Changes in this force-push: Made the styled string macro (@S_str) more interpolate-able (i.e. made it so that properties can be interpolated as well as content), and made styled strings with no interpolation resolve to a value instead of an expression.

@tecosaur tecosaur force-pushed the styled-strings branch 2 times, most recently from a68bb6f to 1f1607d Compare May 16, 2023 17:07
@tecosaur
Copy link
Contributor Author

So far this has just received a few surface-level comments. Is there anything I can do to make this easier to review?

If anybody wants to discuss the design/implementation more, I'm also happy to voice chat etc. in addition to comments here. There seems to be a decent amount of interest in this functionality (by 🚀 reacts it's currently the №4 open PR), and it would be great to see this translate into movement towards making this a shoo-in for merging 😃.

@tecosaur tecosaur force-pushed the styled-strings branch 2 times, most recently from d0110af to 31b7a12 Compare May 18, 2023 11:27
@tecosaur
Copy link
Contributor Author

tecosaur commented May 18, 2023

Because it's not much extra work, I've implemented show(::IO, ::MIME"text/html", ::StyledString).

<pre>               <span style="color: #008000;font-weight: 700;">_</span>
   <span style="color: #000080;font-weight: 700;">_</span>       _ <span style="color: #800000;font-weight: 700;">_</span><span style="color: #008000;font-weight: 700;">(_)</span><span style="color: #800080;font-weight: 700;">_</span>     <span style="color: #808080;"></span>  <span style="font-weight: 700;">Documentation:</span> <a href="https://docs.julialang.org"><span style="text-decoration: #808080 underline;">https://docs.julialang.org</a></span>
  <span style="color: #000080;font-weight: 700;">(_)</span>     | <span style="color: #800000;font-weight: 700;">(_)</span><span style="font-weight: 700;"> </span><span style="color: #800080;font-weight: 700;">(_)</span>    <span style="color: #808080;"></span>
   _ _   _| |_  __ _   <span style="color: #808080;"></span>  Type <span style="color: #808000;background-color: #3a3a3a;font-weight: 700;">?</span> for help, <span style="color: #000080;background-color: #3a3a3a;font-weight: 700;">]?</span> for <a href="https://pkgdocs.julialang.org/"><span style="text-decoration: #808080 underline;">Pkg</a></span> help.
  | | | | | | |/ _` |  <span style="color: #808080;"></span>
  | | |_| | | | (_| |  <span style="color: #808080;"></span>  Version <span style="font-weight: 700;">1.10.0-DEV.1328</span> <span style="font-weight: 300;">(<span style="">2023-05-16<span style="">)</span></span></span>
 _/ |\__'_|_|_|\__'_|  <span style="color: #808080;"></span>  <span style="color: #000080;">styled-strings</span>/<span style="color: #808080;">a68bb6f5cd*</span> (<span style="color: #008000;font-weight: 700;"><span style="font-weight: 400;font-style: italic;"> <span style="">11<span style=""> commits</span></span></span></span><span style="font-style: italic;">, </span><span style="color: #808000;"><span style="font-style: italic;"> <span style="">2<span style=""> <span style="">days</span></span></span></span></span>)
|__/                   <span style="color: #808080;"></span>

</pre>

if GitHub allowed span style in <pre>, this is what you'd see (in light mode):

image

Happy to take out if this seems like too much for Base, I just think it's a nice way of demonstrating another aspect of the improved flexibility of this approach as compared to printstyled etc. — the ability to easily support extension to different output modes.

@vtjnash
Copy link
Member

vtjnash commented May 20, 2023

Is there anything I can do to make this easier to review?

Probably not. If there is a tiny piece we could merge that first, but it is unclear if that makes sense here. The most convincing change is quite simple to make it easier: click "ready for review" so it doesn't look like you are still experimenting on it. Similarly, make sure the build is passing. Particularly for something this large, I don't want to iterate too many times on the review with just fixing up bugs since it tends to drown out the main discussion (Github in particular struggles with more than about 20 comments active at a time).

I am excited by this since, by the sound of things, this will mostly close out my old PR #27430 (though I can still look back through there later to see if I had any useful gadgets written to try to port to this). Showing that "text/html" was a sensible output there was also my demo to test whether it would be worth including, so I am glad to see it here now ;)

@vtjnash vtjnash added needs tests Unit tests are required for this change needs docs Documentation for this change is required needs news A NEWS entry is required for this change labels May 20, 2023
@tecosaur
Copy link
Contributor Author

Thanks for your comments @vtjnash, I was hoping that before marking this as "ready for review" and getting into the details more people would give some feedback on the overall design/approach.

If the overall design looks acceptable, I'll go ahead and write tests, docs, and make sure the build passes.

That said, I've received consistently positive feedback so far. @StefanKarpinski seemed keen on the idea when I discussed it on Slack, and above it seems like @KristofferC's comments were mainly nit-picks. So, perhaps I should just go ahead with the tests/docs/news anyway and mark this as a non-draft PR?

Oh, and from a glance at #27430 it looks like what we may be interested in lifting over is a few of the minor things like the pretty range printing. In terms of the actual system/approach itself, I'm pretty sure this PR subsumes it entirely, and by quite some margin too 😁.

@gbaraldi
Copy link
Member

When a PR is marked as draft people think you are still playing with things and that lots may still change. When you mark it as ready for review then people will take a look.
Of course being able to build and having some tests/examples doesn't hurt for reviewing as well.
In any case great job with the PR. From a far away glance it looks awesome

@aplavin
Copy link
Contributor

aplavin commented May 22, 2023

Can a longer, more descriptive name be used for the string macro instead of S"..."?

Compared to functions, it's much more involved to qualify string macro names (as in MyPackage.@S_str("...")). So it makes more sense not to take up short names unnecessarily, especially given that styled"abc" is easier to guess what it means. I've already seen a couple of S"..." in user code (and wrote myself), used to create either some custom types or just Base Symbols (S"abc def" as a shorter and more efficient Symbol("abc def")).

@tecosaur
Copy link
Contributor Author

tecosaur commented May 22, 2023

I feel like S"styled" is pretty comparable to s"substitution". Descripiveness is good, but so is terseness, particularly when I can see this being used a lot. While I'm with you in general regarding descriptive naming, I think there are instances where the convenience is worth going for something shorter, and I think this is one of them.

Besides, I don't think there's much need to guess what it means, for two reasons:

  1. I feel like if you see S"The {bold:{italic:quick} {(foreground=#cd853f):brown} fox} jumped over the {link={https://en.wikipedia.org/wiki/Laziness}:lazy} dog" (or similar) it's pretty obviously text + styling information.
  2. We have the lovely help?>, which while not a panacea makes it pretty trivial to go from "I wonder what S"" is?" to the docstring.

Before going with S"" I did a search and couldn't find much use of S"" in the wild: https://grep.app/search?q=macro%20S_str&case=true&filter[lang][0]=Julia, and in particular only found one use S"" in a package written for use outside the package's internals (chakravala's DirectSum.jl).

Oh, and with regards to

Compared to functions, it's much more involved to qualify string macro names (as in MyPackage.@S_str("..."))

This actually isn't the case. See the following example:

julia> module Demo
       macro s_str(s)
       3
       end
       f() = s"test"
       end
Main.Demo

julia> Demo.s"hey"
3

@aplavin
Copy link
Contributor

aplavin commented May 22, 2023

S"styled" is pretty comparable to s"substitution"

Almost forgot about substitution-strings. It makes the argument to avoid S"" for styled strings in Base even stronger.
Because really, upper and lower case macros doing completely different and unrelated things, and that's in Base?..

couldn't find much use of S"" in the wild

It's not huge, but more than that website shows: definitions and usages.

@tecosaur
Copy link
Contributor Author

tecosaur commented May 22, 2023

Almost forgot about substitution-strings. It makes the argument to avoid S"" for styled strings in Base even stronger.

Case matters, and I can't see much potential for confusion. A similar complaint could be applied to r"" and raw"", b"", and big"".

It's not huge, but more than that website shows: definitions and usages.

Okay, looks like there are a very small number of other packages too. That said, most of those results look to be forks of an old Julia version that actually had S"" defined, and forks of DirectSum.

Similarly, if I look for style"" there are a bunch of matches for that too. I don't think it's vital to use a prefix that no-one uses, so much as one that's very rarely used, and my impression is that S"" easily clears this bar.

This allows for the construction of matches built on non-String
AbstractStrings.
These new types allow for arbitrary properties to be attached to regions
of an AbstractString or AbstractChar.

The most common expected use of this is for styled content, where the
styling is attached as special properties. This has the major benefit of
separating styling from content, allowing both to be treated better —
functions that operate on the content won't need variants that work
around styling, and operations that interact with the styling will have
many less edge cases (e.g. printing a substring and having to work
around unterminated ANSI styling codes).

Other use cases are also enabled by this, such as text links and the
preserving of line information in string processing.
To easy text styling, a "Face" type is introduced which bundles a
collection of stylistic attributes together (essentially constituting a
typeface). This builds on the recently added Styled{String,Char} types,
and together allow for an ergonomic way of handling styled text.
To make specifying StyledStrings easier, the @S_str macro is added to
convert a minimalistic style markup to either a constant StyledString or
a StyledString-generating expression.

This macro was not easy to write, but seems to work well in practice.
Printing StyledStrings is more complicated than using the printstyled
function as a Face supports a much richer set of attributes, and
StyledString allows for attributes to be nested and overlapping.

With the aid of and the newly added terminfo, we can now print a
StyledString in all it's glory, up to the capabilities of the current
terminal, gracefully degrading italic to underline, and 24-bit colors to
8-bit.
When printing directly to stdout, there is a non-negligible overhead
compared to simply printing to an IOBuffer. Testing indicates 3
allocations per print argument, and benchmarks reveal a ~2x increase in
allocations overall and much as a 10x increase in execution time.

Thus, it seems worthwhile to use a temporary buffer in all cases.
This is just nicer to look at in the REPL
This way should any styled printing occur, regardless of whether a REPL
session is started, it will be handled correctly based on the current
terminal.
The previous S"" macro was essentially one giant for loop with a helper
function.  When adding support for inline face value interpolation, it
was clear that that approach was unmaintainable.  As a result, the
implementation has been completely rewritten.  The new S"" macro is more
maintainable, featureful, and correct — now with a documented EBNF
grammar and more validation during expansion.
tecosaur added a commit to JuliaLang/StyledStrings.jl that referenced this pull request Oct 20, 2023
With minimal changes in order to work, the styling code developed in
JuliaLang/julia#49586 is restructured here as a new standard library.

For context, see the following commits in which the system was
developed:
- JuliaLang/julia@c505b047ac44 (Introduce text faces)
- JuliaLang/julia@eada39b4e162 (Introduce a styled string macro @S_str)
- JuliaLang/julia@13f32f1510d6 (Implement styled printing of StyledStrings)
- JuliaLang/julia@ea24b5371368 (Buffer styled printing)
- JuliaLang/julia@98e9af49325c (Add text/html show method for styled strings)
- JuliaLang/julia@c214350944bf (Custom show methods for faces and simplecolor)
- JuliaLang/julia@4a9128d6b8db (Overhaul S"" macro)
- JuliaLang/julia@0569c57befe7 (Tests for styled strings and faces)
- JuliaLang/julia@f8192fe29c93 (Document StyledStrings and Faces)

Set Version to 1.11
After discussion on Triage, we've decided that the base
Styled{String,Char} types will be renamed to Tagged{String,Char} to
better indicate their versatility and kept in base, with everything else
moved out to a new StyledStrings standard library.
Instead of having a split between Tags/Annotations/Text properties
(regions of the string are annotated with tagged values, and this is a
property of the text), just have Tags/Annotations.

In line with this, the "properties" field of TaggedString/TaggedChar is
renamed to "annotations", and the getter/setter functions are renamed:
- textproperties -> annotations
- textproperty! -> annotate!

While we're at it, improve the docstrings and functions a bit.
In response to further naming discussion. Please, please, let this be
the last rename.

Along the way we have some docstring improvements and stricter macro
construction in the StyledStrings stdlib (erroring on invalid syntax,
instead of warning), with more informative messages.
@tecosaur
Copy link
Contributor Author

after you rebase is this ready to merge from your end?

I've just done the rebase, and I think so! 🤞

I suspect we'll want to make some more tweaks to this before 1.11 is cut, but what we have here now is pretty solid and cleared by triage, so I think we should be good to slap merge at long, long last 🙂.

@LilithHafner LilithHafner merged commit abe4303 into JuliaLang:master Oct 20, 2023
3 of 6 checks passed
@LilithHafner
Copy link
Member

Thank you for your patience, perseverance, and contribution! I can't wait to see what folks build with this :)

@tecosaur
Copy link
Contributor Author

For continued bikeshedding of formatting, please direct your attention to JuliaLang/StyledStrings.jl#1 🙂.

KristofferC pushed a commit that referenced this pull request Oct 23, 2023
A new unit test is also added for the edge-case found, and a few details
of the test string adjusted to make it easier to reason about at a
glance.

-----

This seems to have slipped into #49586 when the `annotatedstring`
function had to be refactored to no longer use `eachstyle` (which was
moved into the stdlib), and escaped the unit tests for index corectness.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
display and printing Aesthetics and correctness of printed representations of objects. don't squash Don't squash merge feature Indicates new feature / enhancement requests
Projects
None yet