Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unquoted indentation-based string literals #248

Closed
haxscramper opened this issue Sep 2, 2020 · 9 comments
Closed

Unquoted indentation-based string literals #248

haxscramper opened this issue Sep 2, 2020 · 9 comments

Comments

@haxscramper
Copy link

haxscramper commented Sep 2, 2020

Proposal

Add unquoted indentation-based string literals with following syntax:

herestring:
  this is an unquoted text that will be treated
  as string literal
  • Does not introduce any new AST elements
  • Proof-of-concept implementation is a single edit to lexer.nim, contained in ~60 lines of code.

Note: I'm of course open to suggestions and comments about implementation - this is by no means final impllementation, I just added it to have something to show (since this RFC is purely syntax sugar).

Implementation

When identifier herestring is found during lexing replace it with triple string literal token. All text with current line indentation + 2 will be cut out as a string.

herestring:
  [ this code will be ]
    [ treated as string literal]
  ^^
  Indentation will be preserved

# This comment has indentation smaller than 0 + 2 and will
# be treated as regular comment

Example of use

I implemented some tests for initial implementation

String format

echo fmt herestring:
  Long text with some {interpolated} elements. You can of course
  write it as regular triple string literal, but then you would
  have to either de-indent it afterwards or move it the first column.

Just as string literal

Nim compiler test suite uses triple quoted string literals to
configure test suite. This:

discard """
  errormsg: "expected: ':', but got: 'echo'"
  file: "tinvcolonlocation1.nim"
  line: 8
  column: 7
"""

can be turned into this

discard herestring:
  errormsg: "expected: ':', but got: 'echo'"
  file: "tinvcolonlocation1.nim"
  line: 8
  column: 7

which is not too much of a difference from feature standpoint, but a lot of people will find herestring version easier to work with + it looks better (this is of course subjective).

Emit statement

{.emit: herestring:
  try {
      auto tmp = (itr != itrEnd);
  } catch (const boost::wave::preprocess_exception& ex) {
      cerr << "ERROR in " << ex.file_name() << " : " << ex.line_no()
           << endl;
      cerr << ex.description() << endl;
      return 1;
  }
.}
@Varriount
Copy link

Aside from the indentation, what does this improve over triple quoted strings?

@SolitudeSF
Copy link

SolitudeSF commented Sep 3, 2020

Github syntax highligher works relatively fine on the new syntax.

github highlighter breaks all the time with current syntax and this for sure wont make it better. also, i have no clue how would i add syntax highlighting with regex for this.

@haxscramper
Copy link
Author

I don't think it is necessary to explicitly add highlighting code this specific new type of code literals. On the contrary - I say there is absolutely no need to highlight this as string literals, or as any kind of specific construct. I repeat what I already said in RFC - a lot of languages share keywords and syntax.

Here is an example how vscode currently handles it right now. I think it looks pretty good. The only thing that needs to be highlighted separately is a herestring: - can be treated as new keyword.

1 Better syntax highlighting in cases where it is necessary to have foreign code literals

Screenshot taken from VScode with zero additional configurations.

image

2 No indentation or unindent is necessary.

Currently, to have correctly indented string literal it is necessary to write

import strutils

echo """
  try {
      auto tmp = (itr != itrEnd);
  } catch (const boost::wave::preprocess_exception& ex) {
      cerr << "ERROR in " << ex.file_name() << " : " << ex.line_no()
""".unindent()

to get this string:

try {
auto tmp = (itr != itrEnd);
} catch (const boost::wave::preprocess_exception& ex) {
cerr << "ERROR in " << ex.file_name() << " : " << ex.line_no()

Since according to documentation of strutils.unindent it "Removes all indentation composed of whitespace from each line in s." (emphasis mine). Which means the string you get is not unindented but rather 'stripped on leading whitespaces', which is not the same thing. And with unindent string literals look like """ """.unindent() and require importing strutils.

It is of course a non-issue to write function to uniformly unindent string.

With new syntax it looks like this:

herestring: # Random C++ code
  try {
      auto tmp = (itr != itrEnd);
  } catch (const boost::wave::preprocess_exception& ex) {
      cerr << "ERROR in " << ex.file_name() << " : " << ex.line_no()
           << endl;

@narimiran
Copy link
Member

narimiran commented Sep 3, 2020

Since according to documentation of strutils.unindent it "Removes all indentation composed of whitespace from each line in s." (emphasis mine). Which means the string you get is not unindented but rather 'stripped on leading whitespaces', which is not the same thing.

There's also another unindent, namely: proc unindent(s: string; count: Natural; padding: string = " "): string which works as intended when you pass the correct count:

import strutils

echo """
  try {
      auto tmp = (itr != itrEnd);
  } catch (const boost::wave::preprocess_exception& ex) {
      cerr << "ERROR in " << ex.file_name() << " : " << ex.line_no()
""".unindent(2)      # <----------- notice `2` here ------------------

produces:

try {
    auto tmp = (itr != itrEnd);
} catch (const boost::wave::preprocess_exception& ex) {
    cerr << "ERROR in " << ex.file_name() << " : " << ex.line_no()

@haxscramper
Copy link
Author

Which requires you to specify indentation in every single string literal that you write in the code. It is perfectly doable and quite easy, but now string literals look like """ """.unindent(n), where n depends on indentation of the current code.

@SolitudeSF
Copy link

and how would that look?

herestring:
  #[

othercode()

@haxscramper
Copy link
Author

haxscramper commented Sep 3, 2020

Remark: the only language that I know of which treats #[ as start of multiline comment that I know of is nim.

Yes, of course it is possible to find innumerable edge cases to break every syntax highlighter possible, but the only two things that break everything down the road are not closed string and #[ / ##[ comment pairs.

@haxscramper
Copy link
Author

Since majority voted against and there is no real feature benefit, but only new syntax sugar I think it is appropriate to close the RFC.

@Araq
Copy link
Member

Araq commented Sep 7, 2020

Thank you very much for your sportsmanship!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants