Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asm/emit syntax: explicit capture of symbols; heredoc string litterals; syntax highlight hints #210

Closed
timotheecour opened this issue Apr 11, 2020 · 14 comments
Labels
wontfix This will not be worked on

Comments

@timotheecour
Copy link
Member

timotheecour commented Apr 11, 2020

discussing here @krux02 proposal for new asm/emit syntax (from nim-lang/Nim#13953 (comment))

I personally don't like the pragma notation for emit. I also don't like multi line string literals. So this is my idea to avoid it:

template c_sizeof*(T: typedesc): int =
  var s: int
  # the symbols s and T are captured
  asm(efHere, [s, T]):
    s = sizeof(T); // remember, this is C code without quotes in Nim code
  s

This is how it can work: I use the asm keyword here instead of emit. Asm is a keyword in the language, therefore the Nim parser already knows about it. The parser knows that this asm statement is followed by a block of indented C code. The parser can then eat the entire block of injected C code and process it like a multiline string literal. The C code ends like all other blocks of Nim end as well, with their indentation. The injected symbols are listed at the top as an explicit list, as it was planned in my quoteAst branch.

The reason for asm over emit is only because asm is a keyword. No other reason. It would also be wrong to call it asm because it isn't assembly. But asm something that I would see as technically possible without breaking anything.

I will comment below on this

@timotheecour
Copy link
Member Author

timotheecour commented Apr 11, 2020

there are 2 separate parts to this proposal

part 1: symbol capture via asm(efHere, [s, T]):

# can you spot the bug here?
asm(efHere, [s, T]):
  s = sizeof(T); // remember, this is C's code without quotes in Nim code

# answer: the `s` in `C's` would've matched; instead following is more robust:
asm(efHere, [s, T]):
  `s` = sizeof(`T`); // remember, this is C's code without quotes in Nim code

# only quoted+captured symbols are transformed, so `foo` is not captured here:
asm(efHere, [s, T]):
  `s` = sizeof(`T`); // some comment `foo` end of comment

Note: emit has another syntax: emit:[s, " = ", result, "++;"] but it's ugly, and the explicit capture proposal is better IMO

part 2: indentation based asm code blocks

this I disagree, asm is the wrong place to introduce this feature; if this feature were introduced it should be instead introduced for string litterals, and then it would simply work for asm without special case (ie asm could take any string litteral, including hypothetical indentation based string litteral)

So this discussion reduces to #161 (for multiline string litterals using indentation) and should be discussed there instead of here.

And my position is still #161 (comment), in favor instead of HEREDOC litterals which avoids all pitfalls of re-indentation (breaking tools like git diff creating large diffs, wrong col info in clang after de-indentation from emitted code etc). Heredoc litterals are great because you can always find an unambiguous symbol, and you don't have to mess anything (including diffing) with re-indentation. A number of languages have it eg:

part 3: syntax highlighter hint

(not part of your proposal, but relevant for both string litteral and especially in context of asm/emit)

heredoc string litteral should be able to take an argument as a hint for syntax highlighters to indicate how to syntax highlight the litteral; eg:

asm(esfHere, [S,T]): @"EOS
auto fun = [](){}; // some c++ code that syntax highlighters can't highlight as anythign but string
EOS" 

vs

asm(esfHere, [S,T]): @"EOS(c++)
auto fun = [](){}; // some c++ code that should be highlighted as C++ tools thanks to the hint
EOS" 

example

  • see for eg https://github.com/yglukhov/ttf/blob/master/ttf.nim which contains large emit blocks. Much cleaner with heredoc than with re-indentation. Especially if future PR's have to reindent, causing large diffs.
    Same argument applies when comparing emitted code in nim with the original code (stb_truetype.h) it was taken from: heredoc introduces 0 diff, whereas indentation would cause pains (and ignoring whitespace eg with -w + similar in git diff is even worse, bc it'd ignore true differences)

part 4: tooling support

making tools support the new syntax for heredoc litteral (or indentation based alternative) is not as hard as you may think (and the worst case is that syntax highlight appears wrong, which is not end of the world and can be fixed relatively quickly).
A lot of tools depend on a few syntax definitions, the main one is https://github.com/Varriount/NimLime/tree/master/Syntaxes (which is read by github's syntax highlight, linguist) so we'd just need PR's like https://github.com/github/linguist/pull/4295/files or https://github.com/Varriount/NimLime/pull/127/files

@timotheecour timotheecour changed the title [WIP, do not review] asm blocks for foreign code without string literals [WIP, do not review] asm/emit syntax: explicit capture of symbols Apr 11, 2020
@timotheecour timotheecour changed the title [WIP, do not review] asm/emit syntax: explicit capture of symbols asm/emit syntax: explicit capture of symbols; heredoc string litterals; syntax highlight hints Apr 11, 2020
@Araq
Copy link
Member

Araq commented Apr 14, 2020

-1 from me and I already debunked point (4). If it's easy to find a token that is not in the string literal you can simply use

const
  TripleQuote = "\"\"\""

"""string
UNUSED
literal
""".replace("UNUSED", TripleQuote)

We can add TripleQuote to strutils if enough people think it's valuable. No lexer additions required.

@krux02
Copy link
Contributor

krux02 commented Apr 15, 2020

I am thinking about editor support right now, in the terms of, the editor detects that there is an emit statement, and it should emit C code. Then for the C code it would be great to highlight with C syntax highlighting. But for this to work the editor would need to detect that the string literal is actually C code, not just a generic block string literal. Therefore I suggest that the emit statement should contains somthing that explicitly states that it emis C code, C++ code, or JS code. Here is an example:

emitC(efHere, [s, T]):
  s = sizeof(T); // remember, this is C's code without quotes in Nim code

Not entirely sure if this works though. The idea here is, that the editor could detect the emitC(...): pattern and change the syntax highlighting for the following block of code into actual C syntax highlighting. If you work with many emit statements this is very valuable. Another bonus would be, if you accidentally try to emit C code into javascript or vice versa, the Nim compiler could emit a nice error message about it, instead of the backend compiler/interpreter.

@nim-lang nim-lang deleted a comment from krux02 Apr 15, 2020
@Araq
Copy link
Member

Araq commented Apr 15, 2020

Why do we need an even more powerful emit statement though? Why not let Nim be a real language that compiles via C or LLVM to binary code.

@nim-lang nim-lang deleted a comment from krux02 Apr 15, 2020
@krux02
Copy link
Contributor

krux02 commented Apr 15, 2020

The reason Scala is a successful language is, it plays nice with an existing Java codebase. The reason C++ is successful is, it plays nice with an existing C codebase. The reason typescript is successful is, it plays nice with an existing javascript codebase. The reason Nim is successful is ... No wait, Nim isn't successful, because nice C and C++ integration has been removed in order to make it a "real language".

Yes, there is a price for this emit. But it is worth it.

@krux02
Copy link
Contributor

krux02 commented Apr 15, 2020

Please don't delete my comments just because you don't feel comfortable with it.

@Araq
Copy link
Member

Araq commented Apr 15, 2020

I delete them because it's pure trolling, Scala and C++ do not even have emit to begin with.

@krux02
Copy link
Contributor

krux02 commented Apr 15, 2020

C++ also don't have an emit. What an argument.

@Araq
Copy link
Member

Araq commented Apr 15, 2020

Yet Nim needs an even better emit in order to be successful. Your argument, not mine.

@krux02
Copy link
Contributor

krux02 commented Apr 15, 2020

Yet Nim needs an even better emit in order to be successful.

Yes.

@timotheecour
Copy link
Member Author

timotheecour commented Apr 15, 2020

Every improvement matters.
Certainly not a sufficient condition to be successful, but yes, improving interop is critical. Nlvm, while very promising, is clearly not a drop-in replacement, so we have to work with what we have and improve it instead of "throwing the towel".

@Araq
Copy link
Member

Araq commented Apr 16, 2020

A minimum of effort has to be put into RFCs, every clear RFC matters. Just look at the title -- "asm/emit syntax: explicit capture of symbols; heredoc string litterals; syntax highlight hints", that's not an RFC, that's a figleaf for your PR.

If you want better string literals, there is an existing RFC for that, see #161

If you want typo-safe emit sections, write an RFC for that and outline how future code generators can deal with it.

@timotheecour
Copy link
Member Author

timotheecour commented Apr 17, 2020

If you want better string literals, there is an existing RFC for that, see #161

I know that's exactly what I pointed out in my reply to top post, see #210 (comment)

this I disagree, asm is the wrong place to introduce this feature; if this feature were introduced it should be instead introduced for string litterals,[...] So this discussion reduces to #161 [...] and should be discussed there instead of here.

If you want typo-safe emit sections, write an RFC for that and outline how future code generators can deal with it.

will get to it and clean up this RFC (please don't close in meantime)

every clear RFC matters

no disagreement there

@Araq
Copy link
Member

Araq commented Oct 28, 2020

will get to it and clean up this RFC (please don't close in meantime)

The meantime is now 6 months. Closing.

@Araq Araq closed this as completed Oct 28, 2020
@timotheecour timotheecour added the wontfix This will not be worked on label Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants