Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow symbols in PLC names #5218

Merged
merged 15 commits into from
May 24, 2023
Merged

Allow symbols in PLC names #5218

merged 15 commits into from
May 24, 2023

Conversation

zliu41
Copy link
Member

@zliu41 zliu41 commented Mar 20, 2023

Doesn't seem harmful, and improves readability.

Copy link
Contributor

@effectfully effectfully left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have very mixed feelings about this:

  1. can it clash with the readable pretty-printer? Maybe the latter omits spaces sometimes
  2. $f, $c etc stuff is noisy and not helpful, it would be ideal if we parsed it and turned it into something readable
  3. I think it's an improvement overall, because I didn't like throwing away useful information, in particular turning == into bad_name etc

I feel like the best solution would be to allow arbitrary ASCII symbols in identifies, but quote those identifiers that are not composed or alphanumerics, _ and '. We could use ` in the beginning of an identifier containing symbols to express "everything up until the next space is a single identifier". And yeah, I would like us to parse $f, $c etc.

@michaelpj
Copy link
Contributor

I do like getting rid of bad_name.

Main question: does the parser handle these? I don't think we should allow non-parseable identifiers.

And yeah, I would like us to parse $f, $c etc.

I'm not sure what you mean by that? Do you mean "use our domain knowledge of GHC to reverse-engineer the meaning of those names and emit a more human readable version"? e.g. "$c becomes '_specialized'" or something? I'm not at all sure about that - I think it's a lot simpler to just reflect what we get from GHC; then at least knowledge about GHC transfers over, rather than it being a mix of GHC and our (probably partial) attempts to back-translate it.

@zliu41
Copy link
Member Author

zliu41 commented Mar 20, 2023

Main question: does the parser handle these? I don't think we should allow non-parseable identifiers.

Yes, the PR includes update to the parser and tests.

@effectfully
Copy link
Contributor

Do you mean "use our domain knowledge of GHC to reverse-engineer the meaning of those names and emit a more human readable version"? e.g. "$c becomes '_specialized'" or something?

Yes.

I'm not at all sure about that - I think it's a lot simpler to just reflect what we get from GHC; then at least knowledge about GHC transfers over, rather than it being a mix of GHC and our (probably partial) attempts to back-translate it.

Well, either we back-translate it, however partially, or all our users who care about performance (i.e. all users who actually deploy the code?) individually.

@zliu41
Copy link
Member Author

zliu41 commented Mar 20, 2023

can it clash with the readable pretty-printer? Maybe the latter omits spaces sometimes

There's no need to have space between identifier names and non-identifier names (e.g., brackets). If it omits spaces between two things that both can be identifier names, then it is the pretty printer that should be fixed.

$f, $c etc stuff is noisy and not helpful, it would be ideal if we parsed it and turned it into something readable

Perhaps, but on the other hand these are standard in GHC Core, so will look more familiar to people who are used to it.

I think it's an improvement overall, because I didn't like throwing away useful information, in particular turning == into bad_name etc

Not only that, but currently, $fOrdInteger_$c>, $fOrdInteger_$c< etc., all become fOrdInteger_c which is quite annoying.

@zliu41
Copy link
Member Author

zliu41 commented Mar 20, 2023

We could use ` in the beginning of an identifier containing symbols to express "everything up until the next space is a single identifier"

If we do that then we need to make sure there's a space after an identifier name, not a closing bracket or anything else. Also, to me that is a very unusual way of doing identifier names and I'm not aware of any other language that does that.

@kwxm
Copy link
Contributor

kwxm commented Mar 20, 2023

I like this because it would allow me to have built-in function names with dots in them, which I was asking about the other day! I think there's no danger of breaking anything because the syntax of Plutus Core, for all its faults, is very minimal and doesn't make any use of the symbols that this PR makes valid in names. However, one argument against it might be that if someone like me was to go ahead and define builtins with strange characters in their names (which I think can be done simply by producing a hand-written Show instance for the DefaultFun type) then we could have builtins whose literal names are significantly different from the name of the constructor that represents them internally, which could lead to confusion. Perhaps I should just be forbidden from doing that. I suppose this might also be bad for anyone who's implemented tools that deal with PLC with alternative syntax which make use of the new symbols we're allowing here, but then maybe we don't support that kind of thing.

On the whole, I think I'm in favour of it because of the GHC compatibility problem that it solves.

@@ -114,7 +114,19 @@ inBraces :: Parser a -> Parser a
inBraces = between (symbol "{") (char '}')

isIdentifierChar :: Char -> Bool
isIdentifierChar c = isAlphaNum c || c == '_' || c == '\''
isIdentifierChar c = isAlphaNum c || elem c identifierSymbols
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could complain that this should be called isIdentifierNonStartingChar or something, but that would be horrible.

@effectfully
Copy link
Contributor

If we do that then we need to make sure there's a space after an identifier name, not a closing bracket or anything else.

Sorry, I should've said "a delimiter" instead of "a space", of course.

Also, to me that is a very unusual way of doing identifier names and I'm not aware of any other language that does that.

Some languages have quoted identifiers, except those languages usually mark both the sides of a quoted identifier with a symbol. In PureScript the symbol is " (see this) and in SQL it can be ` (see this) or " (see this).

As for marking only one side, LISP uses a single ' to denote quotation, albeit a very different kind of quotation.

So I thought we could combine the two strategies. But perhaps they don't combine well indeed and it's worth to mark an identifier at both of its sides to quote it.

@zliu41
Copy link
Member Author

zliu41 commented Apr 24, 2023

in SQL it can be `

@effectfully I like this and I updated the PR to do so. The additional backticks are not quite aesthetically pleasing, and it adds some complexity to the parser, but it is a more robust way of allowing symbols in names.

@zliu41 zliu41 requested review from effectfully and kwxm April 24, 2023 22:19
This reverts commit a206f12.
@@ -17,8 +17,9 @@ support unicode identifiers as well.

replacements :: [(T.Text, T.Text)]
replacements = [
-- this helps with module prefixes
(".", "_")
("-", "minus")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colon?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed replacements. I don't think we need it any more.

strUnquoted <- takeWhileP (Just "identifier-quoted") isQuotedIdentifierChar
void $ char backTick
let str = T.cons backTick (T.snoc strUnquoted backTick)
Name str <$> intern str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I slightly disagree with this approach. This approach says that the parsed name is the string including the backticks. That seems wrong to me: the backticks are part of the syntax of names, that allow us to give an actual name which has special characters in it. The backticks themselves are not part of the name.

If we view them as syntax then we do things slightly differently:

  • The name we parse here is without the backticks
  • When we prettyprint a name, we need to decide what syntax to use depending on whether it can be printed unquoted or not

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scala is a language I know that supports quoted names. I thought foo and `foo` are different names, but I was wrong - they are the same name. So I agree with you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelpj Updated. The only slight problem is that (all *_0 (type) *_0) becomes (all `*`_0 (type) `*`_0) (see e1b585e). The * comes from dummyTyname. Can we use something else there, say a? That would be less confusing anyway since * usually refers to kinds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hilarious. The issue is that we don't want to use a because it's supposed to mean "some type", i.e. each can be distinct. That's not exactly clear anyway, though (you thought it was " * = type" rather than "* = anything"). Maybe we should use ?? Still has the quoting problem, though...

I'm also unsure about whether or not the _unique part should be "part of the name" or not 🤔 I think as it is it now won't parse again? Whereas before it would have done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that we don't want to use a because it's supposed to mean "some type", i.e. each can be distinct.

We can still use a for that. I think GHC invents type variables on similar situations, rather than using *:

ghci> let x = (3 :: Int) False

<interactive>:1:9: error:
    • Couldn't match expected type ‘Bool -> t’ with actual type ‘Int’

or we can use any

I'm also unsure about whether or not the _unique part should be "part of the name" or not thinking I think as it is it now won't parse again? Whereas before it would have done.

I think the behavior of Parser Name has always been wrong. If the name being parsed contains the unique. It would parse name_unique into the name field.

Now it is no worse than before. If we want to parse name_unique then we need to fix it separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still use a for that. I think GHC invents type variables on similar situations, rather than using *

GHC makes fresh type variables for each such thing, which is fine. We'd have to do something similar, which complicates things.

any seems okay.

I think the behavior of Parser Name has always been wrong. If the name being parsed contains the unique. It would parse name_unique into the name field.

It's kind of dumb, but this is sort of fine when we use that syntax to print uniques: then it effectively produces globally unique names, which parses fine! Not sure if anyone relies on that, though 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GHC makes fresh type variables for each such thing, which is fine.

We can reserve a special unique for dummyName, but I'll just go with any.

It's kind of dumb, but this is sort of fine when we use that syntax to print uniques: then it effectively produces globally unique names, which parses fine! Not sure if anyone relies on that, though thinking

🤦

Ok, then I think the easiest solution to restore old behavior is to quote around name_unique, i.e., `foo_i0` instead of `foo`_i0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree it's super ugly, and this now just makes the ugliness very apparent... not that that helps 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an issue about pretty-printing of uniques. If anybody feels like taking over, that would be highly appreciated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated based on comments above.

@zliu41 zliu41 requested a review from michaelpj April 26, 2023 03:03
@michaelpj
Copy link
Contributor

Has conflicts.

Copy link
Contributor

@effectfully effectfully left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review in detail, but seems good to me.

@@ -55,6 +61,31 @@ data Name = Name
deriving stock (Show, Generic, Lift)
deriving anyclass (NFData, Hashable)

-- | Allowed characters in the starting position of a non-quoted identifier.
isIdentifierStartingChar :: Char -> Bool
isIdentifierStartingChar c = isAscii c && isAlpha c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason why no _?

@@ -1,6 +1,6 @@
(Left Type mismatch at () in term
'{ (builtin ifThenElse) (con integer) }'.
Expected type
'(all *_0 (type) *_0)',
'(all `*`_0 (type) `*`_0)',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just fix those stupid error messages and remove the dummy stuff entirely. Could you create a ticket for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@kwxm kwxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the status of this? I just noticed that it was on my list of things to review: I did review it at the time but I think I forgot to click the approve button. It seems harmless and easy to change if we later decide that we don't like something, so it looks safe to merge (although it's probably got a bit out of sync by now).

Copy link
Member Author

@zliu41 zliu41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the status of this?

Oh, it completely slipped my mind. I think it's ready to merge.

@@ -1,6 +1,6 @@
(Left Type mismatch at () in term
'{ (builtin ifThenElse) (con integer) }'.
Expected type
'(all *_0 (type) *_0)',
'(all `*`_0 (type) `*`_0)',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zliu41 zliu41 merged commit c4a404f into master May 24, 2023
@zliu41 zliu41 deleted the zliu41/symbols branch May 24, 2023 21:58
v0d1ch pushed a commit to v0d1ch/plutus that referenced this pull request Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants