Allow symbols in PLC names #5218

zliu41 · 2023-03-20T01:48:00Z

Doesn't seem harmful, and improves readability.

effectfully

I have very mixed feelings about this:

can it clash with the readable pretty-printer? Maybe the latter omits spaces sometimes
$f, $c etc stuff is noisy and not helpful, it would be ideal if we parsed it and turned it into something readable
I think it's an improvement overall, because I didn't like throwing away useful information, in particular turning == into bad_name etc

I feel like the best solution would be to allow arbitrary ASCII symbols in identifies, but quote those identifiers that are not composed or alphanumerics, _ and '. We could use ` in the beginning of an identifier containing symbols to express "everything up until the next space is a single identifier". And yeah, I would like us to parse $f, $c etc.

michaelpj · 2023-03-20T11:23:30Z

I do like getting rid of bad_name.

Main question: does the parser handle these? I don't think we should allow non-parseable identifiers.

And yeah, I would like us to parse $f, $c etc.

I'm not sure what you mean by that? Do you mean "use our domain knowledge of GHC to reverse-engineer the meaning of those names and emit a more human readable version"? e.g. "$c becomes '_specialized'" or something? I'm not at all sure about that - I think it's a lot simpler to just reflect what we get from GHC; then at least knowledge about GHC transfers over, rather than it being a mix of GHC and our (probably partial) attempts to back-translate it.

zliu41 · 2023-03-20T14:46:14Z

Main question: does the parser handle these? I don't think we should allow non-parseable identifiers.

Yes, the PR includes update to the parser and tests.

effectfully · 2023-03-20T14:48:37Z

Do you mean "use our domain knowledge of GHC to reverse-engineer the meaning of those names and emit a more human readable version"? e.g. "$c becomes '_specialized'" or something?

Yes.

I'm not at all sure about that - I think it's a lot simpler to just reflect what we get from GHC; then at least knowledge about GHC transfers over, rather than it being a mix of GHC and our (probably partial) attempts to back-translate it.

Well, either we back-translate it, however partially, or all our users who care about performance (i.e. all users who actually deploy the code?) individually.

zliu41 · 2023-03-20T14:50:01Z

can it clash with the readable pretty-printer? Maybe the latter omits spaces sometimes

There's no need to have space between identifier names and non-identifier names (e.g., brackets). If it omits spaces between two things that both can be identifier names, then it is the pretty printer that should be fixed.

$f, $c etc stuff is noisy and not helpful, it would be ideal if we parsed it and turned it into something readable

Perhaps, but on the other hand these are standard in GHC Core, so will look more familiar to people who are used to it.

I think it's an improvement overall, because I didn't like throwing away useful information, in particular turning == into bad_name etc

Not only that, but currently, $fOrdInteger_$c>, $fOrdInteger_$c< etc., all become fOrdInteger_c which is quite annoying.

zliu41 · 2023-03-20T14:53:54Z

We could use ` in the beginning of an identifier containing symbols to express "everything up until the next space is a single identifier"

If we do that then we need to make sure there's a space after an identifier name, not a closing bracket or anything else. Also, to me that is a very unusual way of doing identifier names and I'm not aware of any other language that does that.

kwxm · 2023-03-20T18:41:25Z

I like this because it would allow me to have built-in function names with dots in them, which I was asking about the other day! I think there's no danger of breaking anything because the syntax of Plutus Core, for all its faults, is very minimal and doesn't make any use of the symbols that this PR makes valid in names. However, one argument against it might be that if someone like me was to go ahead and define builtins with strange characters in their names (which I think can be done simply by producing a hand-written Show instance for the DefaultFun type) then we could have builtins whose literal names are significantly different from the name of the constructor that represents them internally, which could lead to confusion. Perhaps I should just be forbidden from doing that. I suppose this might also be bad for anyone who's implemented tools that deal with PLC with alternative syntax which make use of the new symbols we're allowing here, but then maybe we don't support that kind of thing.

On the whole, I think I'm in favour of it because of the GHC compatibility problem that it solves.

kwxm · 2023-03-20T20:06:42Z

plutus-core/plutus-core/src/PlutusCore/Parser/ParserCommon.hs

@@ -114,7 +114,19 @@ inBraces :: Parser a -> Parser a
 inBraces = between (symbol "{") (char '}')

 isIdentifierChar :: Char -> Bool
-isIdentifierChar c = isAlphaNum c || c == '_' || c == '\''
+isIdentifierChar c = isAlphaNum c || elem c identifierSymbols


I could complain that this should be called isIdentifierNonStartingChar or something, but that would be horrible.

effectfully · 2023-03-20T22:59:03Z

If we do that then we need to make sure there's a space after an identifier name, not a closing bracket or anything else.

Sorry, I should've said "a delimiter" instead of "a space", of course.

Also, to me that is a very unusual way of doing identifier names and I'm not aware of any other language that does that.

Some languages have quoted identifiers, except those languages usually mark both the sides of a quoted identifier with a symbol. In PureScript the symbol is " (see this) and in SQL it can be ` (see this) or " (see this).

As for marking only one side, LISP uses a single ' to denote quotation, albeit a very different kind of quotation.

So I thought we could combine the two strategies. But perhaps they don't combine well indeed and it's worth to mark an identifier at both of its sides to quote it.

…1/symbols

zliu41 · 2023-04-24T22:18:53Z

in SQL it can be `

@effectfully I like this and I updated the PR to do so. The additional backticks are not quite aesthetically pleasing, and it adds some complexity to the parser, but it is a more robust way of allowing symbols in names.

This reverts commit a206f12.

michaelpj · 2023-03-20T11:17:32Z

plutus-core/plutus-ir/src/PlutusIR/Compiler/Names.hs

@@ -17,8 +17,9 @@ support unicode identifiers as well.

 replacements :: [(T.Text, T.Text)]
 replacements = [
-    -- this helps with module prefixes
-    (".", "_")
+    ("-", "minus")


I've removed replacements. I don't think we need it any more.

plutus-core/changelog.d/20230319_184206_unsafeFixIO_symbols.md

michaelpj · 2023-04-25T09:37:23Z

plutus-core/plutus-core/src/PlutusCore/Parser/ParserCommon.hs

+        strUnquoted <- takeWhileP (Just "identifier-quoted") isQuotedIdentifierChar
+        void $ char backTick
+        let str = T.cons backTick (T.snoc strUnquoted backTick)
+        Name str <$> intern str


I think I slightly disagree with this approach. This approach says that the parsed name is the string including the backticks. That seems wrong to me: the backticks are part of the syntax of names, that allow us to give an actual name which has special characters in it. The backticks themselves are not part of the name.

If we view them as syntax then we do things slightly differently:

The name we parse here is without the backticks

When we prettyprint a name, we need to decide what syntax to use depending on whether it can be printed unquoted or not

Scala is a language I know that supports quoted names. I thought foo and `foo` are different names, but I was wrong - they are the same name. So I agree with you.

@michaelpj Updated. The only slight problem is that (all *_0 (type) *_0) becomes (all `*`_0 (type) `*`_0) (see e1b585e). The * comes from dummyTyname. Can we use something else there, say a? That would be less confusing anyway since * usually refers to kinds.

Hilarious. The issue is that we don't want to use a because it's supposed to mean "some type", i.e. each can be distinct. That's not exactly clear anyway, though (you thought it was " * = type" rather than "* = anything"). Maybe we should use ?? Still has the quoting problem, though...

I'm also unsure about whether or not the _unique part should be "part of the name" or not 🤔 I think as it is it now won't parse again? Whereas before it would have done.

The issue is that we don't want to use a because it's supposed to mean "some type", i.e. each can be distinct.

We can still use a for that. I think GHC invents type variables on similar situations, rather than using *:

ghci> let x = (3 :: Int) False <interactive>:1:9: error: • Couldn't match expected type ‘Bool -> t’ with actual type ‘Int’

or we can use any

I'm also unsure about whether or not the _unique part should be "part of the name" or not thinking I think as it is it now won't parse again? Whereas before it would have done.

I think the behavior of Parser Name has always been wrong. If the name being parsed contains the unique. It would parse name_unique into the name field.

Now it is no worse than before. If we want to parse name_unique then we need to fix it separately.

We can still use a for that. I think GHC invents type variables on similar situations, rather than using *

GHC makes fresh type variables for each such thing, which is fine. We'd have to do something similar, which complicates things.

any seems okay.

I think the behavior of Parser Name has always been wrong. If the name being parsed contains the unique. It would parse name_unique into the name field.

It's kind of dumb, but this is sort of fine when we use that syntax to print uniques: then it effectively produces globally unique names, which parses fine! Not sure if anyone relies on that, though 🤔

GHC makes fresh type variables for each such thing, which is fine.

We can reserve a special unique for dummyName, but I'll just go with any.

It's kind of dumb, but this is sort of fine when we use that syntax to print uniques: then it effectively produces globally unique names, which parses fine! Not sure if anyone relies on that, though thinking

🤦

Ok, then I think the easiest solution to restore old behavior is to quote around name_unique, i.e., `foo_i0` instead of `foo`_i0

I do agree it's super ugly, and this now just makes the ugliness very apparent... not that that helps 🤷

We have an issue about pretty-printing of uniques. If anybody feels like taking over, that would be highly appreciated.

Updated based on comments above.

…1/symbols

Co-authored-by: Michael Peyton Jones <[email protected]>

…1/symbols

plutus-core/plutus-core/src/PlutusCore/DeBruijn/Internal.hs

michaelpj · 2023-04-26T09:55:38Z

Has conflicts.

effectfully

Didn't review in detail, but seems good to me.

effectfully · 2023-04-25T18:47:00Z

plutus-core/plutus-core/src/PlutusCore/Name.hs

@@ -55,6 +61,31 @@ data Name = Name
    deriving stock (Show, Generic, Lift)
    deriving anyclass (NFData, Hashable)

+-- | Allowed characters in the starting position of a non-quoted identifier.
+isIdentifierStartingChar :: Char -> Bool
+isIdentifierStartingChar c = isAscii c && isAlpha c


Any specific reason why no _?

plutus-tx-plugin/plutus-tx-plugin.cabal

effectfully · 2023-04-26T00:27:01Z

plutus-core/untyped-plutus-core/test/Evaluation/Golden/iteAtIntegerAtInteger.type.golden

@@ -1,6 +1,6 @@
 (Left Type mismatch at () in term
  '{ (builtin ifThenElse) (con integer) }'.
 Expected type
-  '(all *_0 (type) *_0)',
+  '(all `*`_0 (type) `*`_0)',


We should just fix those stupid error messages and remove the dummy stuff entirely. Could you create a ticket for it?

https://input-output.atlassian.net/browse/PLT-6059

kwxm

What's the status of this? I just noticed that it was on my list of things to review: I did review it at the time but I think I forgot to click the approve button. It seems harmless and easy to change if we later decide that we don't like something, so it looks safe to merge (although it's probably got a bit out of sync by now).

…1/symbols

zliu41

What's the status of this?

Oh, it completely slipped my mind. I think it's ready to merge.

zliu41 · 2023-05-24T21:06:07Z

plutus-core/untyped-plutus-core/test/Evaluation/Golden/iteAtIntegerAtInteger.type.golden

@@ -1,6 +1,6 @@
 (Left Type mismatch at () in term
  '{ (builtin ifThenElse) (con integer) }'.
 Expected type
-  '(all *_0 (type) *_0)',
+  '(all `*`_0 (type) `*`_0)',


https://input-output.atlassian.net/browse/PLT-6059

Allow symbols in PLC names

429d591

zliu41 requested review from michaelpj, effectfully and kwxm March 20, 2023 01:48

effectfully reviewed Mar 20, 2023

View reviewed changes

kwxm approved these changes Mar 20, 2023

View reviewed changes

zliu41 added 3 commits April 24, 2023 13:51

Merge branch 'master' of github.com:input-output-hk/plutus into zliu4…

11dc3d5

…1/symbols

Use quoted names

1e13fa6

Fix pretty printer

a206f12

zliu41 requested review from effectfully and kwxm April 24, 2023 22:19

Revert "Fix pretty printer"

037ea0b

This reverts commit a206f12.

michaelpj reviewed Apr 25, 2023

View reviewed changes

zliu41 and others added 7 commits April 25, 2023 05:41

Merge branch 'master' of github.com:input-output-hk/plutus into zliu4…

586cf58

…1/symbols

Backticks are not part of a name

e1b585e

Add a haddock

93037d5

Update plutus-core/changelog.d/20230319_184206_unsafeFixIO_symbols.md

f82b050

Co-authored-by: Michael Peyton Jones <[email protected]>

Revert accidental change to plutus-tx-plugin.cabal

cfec3df

Merge branch 'master' of github.com:input-output-hk/plutus into zliu4…

2b85361

…1/symbols

Use any and update pretty printer

3f1bf96

zliu41 requested a review from michaelpj April 26, 2023 03:03

michaelpj reviewed Apr 26, 2023

View reviewed changes

plutus-core/plutus-core/src/PlutusCore/DeBruijn/Internal.hs Outdated Show resolved Hide resolved

effectfully approved these changes Apr 26, 2023

View reviewed changes

kwxm approved these changes May 24, 2023

View reviewed changes

Merge branch 'master' of github.com:input-output-hk/plutus into zliu4…

be7c740

…1/symbols

zliu41 commented May 24, 2023

View reviewed changes

zliu41 added 2 commits May 24, 2023 14:08

Address comments

051a411

formatting

0aec75b

zliu41 merged commit c4a404f into master May 24, 2023

zliu41 deleted the zliu41/symbols branch May 24, 2023 21:58

v0d1ch pushed a commit to v0d1ch/plutus that referenced this pull request Dec 6, 2024

Allow symbols in PLC names (IntersectMBO#5218)

afbc3ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow symbols in PLC names #5218

Allow symbols in PLC names #5218

zliu41 commented Mar 20, 2023

effectfully left a comment

michaelpj commented Mar 20, 2023

zliu41 commented Mar 20, 2023

effectfully commented Mar 20, 2023

zliu41 commented Mar 20, 2023 •

edited

Loading

zliu41 commented Mar 20, 2023

kwxm commented Mar 20, 2023 •

edited

Loading

kwxm Mar 20, 2023

effectfully commented Mar 20, 2023

zliu41 commented Apr 24, 2023

michaelpj Mar 20, 2023

zliu41 Apr 25, 2023

michaelpj Apr 25, 2023

zliu41 Apr 25, 2023

zliu41 Apr 25, 2023

michaelpj Apr 25, 2023

zliu41 Apr 25, 2023

michaelpj Apr 25, 2023

zliu41 Apr 25, 2023

michaelpj Apr 25, 2023

effectfully Apr 25, 2023

zliu41 Apr 26, 2023

michaelpj commented Apr 26, 2023

effectfully left a comment

effectfully Apr 25, 2023

effectfully Apr 26, 2023

zliu41 May 24, 2023

kwxm left a comment

zliu41 left a comment

zliu41 May 24, 2023

Allow symbols in PLC names #5218

Allow symbols in PLC names #5218

Conversation

zliu41 commented Mar 20, 2023

effectfully left a comment

Choose a reason for hiding this comment

michaelpj commented Mar 20, 2023

zliu41 commented Mar 20, 2023

effectfully commented Mar 20, 2023

zliu41 commented Mar 20, 2023 • edited Loading

zliu41 commented Mar 20, 2023

kwxm commented Mar 20, 2023 • edited Loading

Choose a reason for hiding this comment

effectfully commented Mar 20, 2023

zliu41 commented Apr 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelpj commented Apr 26, 2023

effectfully left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kwxm left a comment

Choose a reason for hiding this comment

zliu41 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zliu41 commented Mar 20, 2023 •

edited

Loading

kwxm commented Mar 20, 2023 •

edited

Loading