asm/emit syntax: explicit capture of symbols; heredoc string litterals; syntax highlight hints #210

timotheecour · 2020-04-11T22:23:24Z

discussing here @krux02 proposal for new asm/emit syntax (from nim-lang/Nim#13953 (comment))

I personally don't like the pragma notation for emit. I also don't like multi line string literals. So this is my idea to avoid it:

template c_sizeof*(T: typedesc): int =
  var s: int
  # the symbols s and T are captured
  asm(efHere, [s, T]):
    s = sizeof(T); // remember, this is C code without quotes in Nim code
  s

This is how it can work: I use the asm keyword here instead of emit. Asm is a keyword in the language, therefore the Nim parser already knows about it. The parser knows that this asm statement is followed by a block of indented C code. The parser can then eat the entire block of injected C code and process it like a multiline string literal. The C code ends like all other blocks of Nim end as well, with their indentation. The injected symbols are listed at the top as an explicit list, as it was planned in my quoteAst branch.

The reason for asm over emit is only because asm is a keyword. No other reason. It would also be wrong to call it asm because it isn't assembly. But asm something that I would see as technically possible without breaking anything.

I will comment below on this

timotheecour · 2020-04-11T23:24:56Z

there are 2 separate parts to this proposal

part 1: symbol capture via `asm(efHere, [s, T]):`

on this I fully agree but with 1 amendment (see below), this is the exact analog of both our approaches to "fix" quote do ([superseded] new macros.genAst: sidesteps issues with quote do Nim#11722 and New quote ast 2 Nim#11823).
This would "fix"/close A1 from bugs with {.emit.} Nim#13943
it's always possible for user to find a non ambiguous symbol if the emitted code is a string litteral (if it's a const string passed as argument, it's trickier; but that's an edge case)
I would amend your proposal by still require the string to be quoted in the foreign code, to reduce ambiguities further, eg:

# can you spot the bug here?
asm(efHere, [s, T]):
  s = sizeof(T); // remember, this is C's code without quotes in Nim code

# answer: the `s` in `C's` would've matched; instead following is more robust:
asm(efHere, [s, T]):
  `s` = sizeof(`T`); // remember, this is C's code without quotes in Nim code

# only quoted+captured symbols are transformed, so `foo` is not captured here:
asm(efHere, [s, T]):
  `s` = sizeof(`T`); // some comment `foo` end of comment

Note: emit has another syntax: emit:[s, " = ", result, "++;"] but it's ugly, and the explicit capture proposal is better IMO

part 2: indentation based asm code blocks

this I disagree, asm is the wrong place to introduce this feature; if this feature were introduced it should be instead introduced for string litterals, and then it would simply work for asm without special case (ie asm could take any string litteral, including hypothetical indentation based string litteral)

So this discussion reduces to #161 (for multiline string litterals using indentation) and should be discussed there instead of here.

And my position is still #161 (comment), in favor instead of HEREDOC litterals which avoids all pitfalls of re-indentation (breaking tools like git diff creating large diffs, wrong col info in clang after de-indentation from emitted code etc). Heredoc litterals are great because you can always find an unambiguous symbol, and you don't have to mess anything (including diffing) with re-indentation. A number of languages have it eg:

C++ (since C++11, see https://en.cppreference.com/w/cpp/language/string_literal)
D (see https://dlang.org/spec/lex.html)
PHP
even bash
...

part 3: syntax highlighter hint

(not part of your proposal, but relevant for both string litteral and especially in context of asm/emit)

heredoc string litteral should be able to take an argument as a hint for syntax highlighters to indicate how to syntax highlight the litteral; eg:

asm(esfHere, [S,T]): @"EOS
auto fun = [](){}; // some c++ code that syntax highlighters can't highlight as anythign but string
EOS"

vs

asm(esfHere, [S,T]): @"EOS(c++)
auto fun = [](){}; // some c++ code that should be highlighted as C++ tools thanks to the hint
EOS"

example

see for eg https://github.com/yglukhov/ttf/blob/master/ttf.nim which contains large emit blocks. Much cleaner with heredoc than with re-indentation. Especially if future PR's have to reindent, causing large diffs.
Same argument applies when comparing emitted code in nim with the original code (stb_truetype.h) it was taken from: heredoc introduces 0 diff, whereas indentation would cause pains (and ignoring whitespace eg with -w + similar in git diff is even worse, bc it'd ignore true differences)

part 4: tooling support

making tools support the new syntax for heredoc litteral (or indentation based alternative) is not as hard as you may think (and the worst case is that syntax highlight appears wrong, which is not end of the world and can be fixed relatively quickly).
A lot of tools depend on a few syntax definitions, the main one is https://github.com/Varriount/NimLime/tree/master/Syntaxes (which is read by github's syntax highlight, linguist) so we'd just need PR's like https://github.com/github/linguist/pull/4295/files or https://github.com/Varriount/NimLime/pull/127/files

Araq · 2020-04-14T18:05:40Z

-1 from me and I already debunked point (4). If it's easy to find a token that is not in the string literal you can simply use

const
  TripleQuote = "\"\"\""

"""string
UNUSED
literal
""".replace("UNUSED", TripleQuote)

We can add TripleQuote to strutils if enough people think it's valuable. No lexer additions required.

krux02 · 2020-04-15T10:44:32Z

I am thinking about editor support right now, in the terms of, the editor detects that there is an emit statement, and it should emit C code. Then for the C code it would be great to highlight with C syntax highlighting. But for this to work the editor would need to detect that the string literal is actually C code, not just a generic block string literal. Therefore I suggest that the emit statement should contains somthing that explicitly states that it emis C code, C++ code, or JS code. Here is an example:

emitC(efHere, [s, T]):
  s = sizeof(T); // remember, this is C's code without quotes in Nim code

Not entirely sure if this works though. The idea here is, that the editor could detect the emitC(...): pattern and change the syntax highlighting for the following block of code into actual C syntax highlighting. If you work with many emit statements this is very valuable. Another bonus would be, if you accidentally try to emit C code into javascript or vice versa, the Nim compiler could emit a nice error message about it, instead of the backend compiler/interpreter.

Araq · 2020-04-15T11:56:35Z

Why do we need an even more powerful emit statement though? Why not let Nim be a real language that compiles via C or LLVM to binary code.

krux02 · 2020-04-15T12:22:21Z

The reason Scala is a successful language is, it plays nice with an existing Java codebase. The reason C++ is successful is, it plays nice with an existing C codebase. The reason typescript is successful is, it plays nice with an existing javascript codebase. The reason Nim is successful is ... No wait, Nim isn't successful, because nice C and C++ integration has been removed in order to make it a "real language".

Yes, there is a price for this emit. But it is worth it.

krux02 · 2020-04-15T12:22:41Z

Please don't delete my comments just because you don't feel comfortable with it.

Araq · 2020-04-15T12:29:24Z

I delete them because it's pure trolling, Scala and C++ do not even have emit to begin with.

krux02 · 2020-04-15T12:43:31Z

C++ also don't have an emit. What an argument.

Araq · 2020-04-15T14:01:13Z

Yet Nim needs an even better emit in order to be successful. Your argument, not mine.

krux02 · 2020-04-15T15:23:51Z

Yet Nim needs an even better emit in order to be successful.

Yes.

timotheecour · 2020-04-15T18:52:48Z

Every improvement matters.
Certainly not a sufficient condition to be successful, but yes, improving interop is critical. Nlvm, while very promising, is clearly not a drop-in replacement, so we have to work with what we have and improve it instead of "throwing the towel".

C++ doesn't need emit because it's mostly backward compatible with C, and can use extern "C" for C mangling/linkage etc.
Go has its equivalent of emit and is an important feature, see https://golang.org/cmd/cgo/
Rust also has its equivalent, see https://github.com/mystor/rust-cpp

Araq · 2020-04-16T08:53:51Z

A minimum of effort has to be put into RFCs, every clear RFC matters. Just look at the title -- "asm/emit syntax: explicit capture of symbols; heredoc string litterals; syntax highlight hints", that's not an RFC, that's a figleaf for your PR.

If you want better string literals, there is an existing RFC for that, see #161

If you want typo-safe emit sections, write an RFC for that and outline how future code generators can deal with it.

timotheecour · 2020-04-17T22:21:13Z

If you want better string literals, there is an existing RFC for that, see #161

I know that's exactly what I pointed out in my reply to top post, see #210 (comment)

this I disagree, asm is the wrong place to introduce this feature; if this feature were introduced it should be instead introduced for string litterals,[...] So this discussion reduces to #161 [...] and should be discussed there instead of here.

If you want typo-safe emit sections, write an RFC for that and outline how future code generators can deal with it.

will get to it and clean up this RFC (please don't close in meantime)

every clear RFC matters

no disagreement there

Araq · 2020-10-28T11:54:47Z

will get to it and clean up this RFC (please don't close in meantime)

The meantime is now 6 months. Closing.

timotheecour mentioned this issue Apr 11, 2020

emit("here"): "c code" + other emit fixes; new module experimental/backendutils.nim nim-lang/Nim#13953

Closed

timotheecour changed the title ~~[WIP, do not review] asm blocks for foreign code without string literals~~ [WIP, do not review] asm/emit syntax: explicit capture of symbols Apr 11, 2020

timotheecour changed the title ~~[WIP, do not review] asm/emit syntax: explicit capture of symbols~~ asm/emit syntax: explicit capture of symbols; heredoc string litterals; syntax highlight hints Apr 11, 2020

nim-lang deleted a comment from krux02 Apr 15, 2020

haxscramper mentioned this issue Sep 2, 2020

Unquoted indentation-based string literals #248

Closed

Araq closed this as completed Oct 28, 2020

timotheecour added the wontfix This will not be worked on label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asm/emit syntax: explicit capture of symbols; heredoc string litterals; syntax highlight hints #210

asm/emit syntax: explicit capture of symbols; heredoc string litterals; syntax highlight hints #210

timotheecour commented Apr 11, 2020 •

edited

Loading

timotheecour commented Apr 11, 2020 •

edited

Loading

Araq commented Apr 14, 2020 •

edited

Loading

krux02 commented Apr 15, 2020

Araq commented Apr 15, 2020

krux02 commented Apr 15, 2020

krux02 commented Apr 15, 2020

Araq commented Apr 15, 2020

krux02 commented Apr 15, 2020

Araq commented Apr 15, 2020

krux02 commented Apr 15, 2020 •

edited

Loading

timotheecour commented Apr 15, 2020 •

edited

Loading

Araq commented Apr 16, 2020 •

edited

Loading

timotheecour commented Apr 17, 2020 •

edited

Loading

Araq commented Oct 28, 2020

asm/emit syntax: explicit capture of symbols; heredoc string litterals; syntax highlight hints #210

asm/emit syntax: explicit capture of symbols; heredoc string litterals; syntax highlight hints #210

Comments

timotheecour commented Apr 11, 2020 • edited Loading

timotheecour commented Apr 11, 2020 • edited Loading

part 1: symbol capture via asm(efHere, [s, T]):

part 2: indentation based asm code blocks

part 3: syntax highlighter hint

example

part 4: tooling support

Araq commented Apr 14, 2020 • edited Loading

krux02 commented Apr 15, 2020

Araq commented Apr 15, 2020

krux02 commented Apr 15, 2020

krux02 commented Apr 15, 2020

Araq commented Apr 15, 2020

krux02 commented Apr 15, 2020

Araq commented Apr 15, 2020

krux02 commented Apr 15, 2020 • edited Loading

timotheecour commented Apr 15, 2020 • edited Loading

Araq commented Apr 16, 2020 • edited Loading

timotheecour commented Apr 17, 2020 • edited Loading

Araq commented Oct 28, 2020

timotheecour commented Apr 11, 2020 •

edited

Loading

timotheecour commented Apr 11, 2020 •

edited

Loading

part 1: symbol capture via `asm(efHere, [s, T]):`

Araq commented Apr 14, 2020 •

edited

Loading

krux02 commented Apr 15, 2020 •

edited

Loading

timotheecour commented Apr 15, 2020 •

edited

Loading

Araq commented Apr 16, 2020 •

edited

Loading

timotheecour commented Apr 17, 2020 •

edited

Loading