-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow symbols in PLC names #5218
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have very mixed feelings about this:
- can it clash with the readable pretty-printer? Maybe the latter omits spaces sometimes
$f
,$c
etc stuff is noisy and not helpful, it would be ideal if we parsed it and turned it into something readable- I think it's an improvement overall, because I didn't like throwing away useful information, in particular turning
==
intobad_name
etc
I feel like the best solution would be to allow arbitrary ASCII symbols in identifies, but quote those identifiers that are not composed or alphanumerics, _
and '
. We could use `
in the beginning of an identifier containing symbols to express "everything up until the next space is a single identifier". And yeah, I would like us to parse $f
, $c
etc.
I do like getting rid of Main question: does the parser handle these? I don't think we should allow non-parseable identifiers.
I'm not sure what you mean by that? Do you mean "use our domain knowledge of GHC to reverse-engineer the meaning of those names and emit a more human readable version"? e.g. "$c becomes '_specialized'" or something? I'm not at all sure about that - I think it's a lot simpler to just reflect what we get from GHC; then at least knowledge about GHC transfers over, rather than it being a mix of GHC and our (probably partial) attempts to back-translate it. |
Yes, the PR includes update to the parser and tests. |
Yes.
Well, either we back-translate it, however partially, or all our users who care about performance (i.e. all users who actually deploy the code?) individually. |
There's no need to have space between identifier names and non-identifier names (e.g., brackets). If it omits spaces between two things that both can be identifier names, then it is the pretty printer that should be fixed.
Perhaps, but on the other hand these are standard in GHC Core, so will look more familiar to people who are used to it.
Not only that, but currently, |
If we do that then we need to make sure there's a space after an identifier name, not a closing bracket or anything else. Also, to me that is a very unusual way of doing identifier names and I'm not aware of any other language that does that. |
I like this because it would allow me to have built-in function names with dots in them, which I was asking about the other day! I think there's no danger of breaking anything because the syntax of Plutus Core, for all its faults, is very minimal and doesn't make any use of the symbols that this PR makes valid in names. However, one argument against it might be that if someone like me was to go ahead and define builtins with strange characters in their names (which I think can be done simply by producing a hand-written On the whole, I think I'm in favour of it because of the GHC compatibility problem that it solves. |
@@ -114,7 +114,19 @@ inBraces :: Parser a -> Parser a | |||
inBraces = between (symbol "{") (char '}') | |||
|
|||
isIdentifierChar :: Char -> Bool | |||
isIdentifierChar c = isAlphaNum c || c == '_' || c == '\'' | |||
isIdentifierChar c = isAlphaNum c || elem c identifierSymbols |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could complain that this should be called isIdentifierNonStartingChar
or something, but that would be horrible.
Sorry, I should've said "a delimiter" instead of "a space", of course.
Some languages have quoted identifiers, except those languages usually mark both the sides of a quoted identifier with a symbol. In PureScript the symbol is As for marking only one side, LISP uses a single So I thought we could combine the two strategies. But perhaps they don't combine well indeed and it's worth to mark an identifier at both of its sides to quote it. |
@effectfully I like this and I updated the PR to do so. The additional backticks are not quite aesthetically pleasing, and it adds some complexity to the parser, but it is a more robust way of allowing symbols in names. |
This reverts commit a206f12.
@@ -17,8 +17,9 @@ support unicode identifiers as well. | |||
|
|||
replacements :: [(T.Text, T.Text)] | |||
replacements = [ | |||
-- this helps with module prefixes | |||
(".", "_") | |||
("-", "minus") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
colon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed replacements
. I don't think we need it any more.
strUnquoted <- takeWhileP (Just "identifier-quoted") isQuotedIdentifierChar | ||
void $ char backTick | ||
let str = T.cons backTick (T.snoc strUnquoted backTick) | ||
Name str <$> intern str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I slightly disagree with this approach. This approach says that the parsed name is the string including the backticks. That seems wrong to me: the backticks are part of the syntax of names, that allow us to give an actual name which has special characters in it. The backticks themselves are not part of the name.
If we view them as syntax then we do things slightly differently:
- The name we parse here is without the backticks
- When we prettyprint a name, we need to decide what syntax to use depending on whether it can be printed unquoted or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scala is a language I know that supports quoted names. I thought foo
and `foo`
are different names, but I was wrong - they are the same name. So I agree with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelpj Updated. The only slight problem is that (all *_0 (type) *_0)
becomes (all `*`_0 (type) `*`_0)
(see e1b585e). The *
comes from dummyTyname
. Can we use something else there, say a
? That would be less confusing anyway since *
usually refers to kinds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hilarious. The issue is that we don't want to use a
because it's supposed to mean "some type", i.e. each can be distinct. That's not exactly clear anyway, though (you thought it was " * = type" rather than "* = anything"). Maybe we should use ?
? Still has the quoting problem, though...
I'm also unsure about whether or not the _unique
part should be "part of the name" or not 🤔 I think as it is it now won't parse again? Whereas before it would have done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that we don't want to use a because it's supposed to mean "some type", i.e. each can be distinct.
We can still use a
for that. I think GHC invents type variables on similar situations, rather than using *
:
ghci> let x = (3 :: Int) False
<interactive>:1:9: error:
• Couldn't match expected type ‘Bool -> t’ with actual type ‘Int’
or we can use any
I'm also unsure about whether or not the _unique part should be "part of the name" or not thinking I think as it is it now won't parse again? Whereas before it would have done.
I think the behavior of Parser Name
has always been wrong. If the name being parsed contains the unique. It would parse name_unique
into the name
field.
Now it is no worse than before. If we want to parse name_unique
then we need to fix it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still use a for that. I think GHC invents type variables on similar situations, rather than using *
GHC makes fresh type variables for each such thing, which is fine. We'd have to do something similar, which complicates things.
any
seems okay.
I think the behavior of Parser Name has always been wrong. If the name being parsed contains the unique. It would parse name_unique into the name field.
It's kind of dumb, but this is sort of fine when we use that syntax to print uniques: then it effectively produces globally unique names, which parses fine! Not sure if anyone relies on that, though 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GHC makes fresh type variables for each such thing, which is fine.
We can reserve a special unique for dummyName
, but I'll just go with any
.
It's kind of dumb, but this is sort of fine when we use that syntax to print uniques: then it effectively produces globally unique names, which parses fine! Not sure if anyone relies on that, though thinking
🤦
Ok, then I think the easiest solution to restore old behavior is to quote around name_unique
, i.e., `foo_i0`
instead of `foo`_i0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree it's super ugly, and this now just makes the ugliness very apparent... not that that helps 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an issue about pretty-printing of uniques. If anybody feels like taking over, that would be highly appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated based on comments above.
Has conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't review in detail, but seems good to me.
@@ -55,6 +61,31 @@ data Name = Name | |||
deriving stock (Show, Generic, Lift) | |||
deriving anyclass (NFData, Hashable) | |||
|
|||
-- | Allowed characters in the starting position of a non-quoted identifier. | |||
isIdentifierStartingChar :: Char -> Bool | |||
isIdentifierStartingChar c = isAscii c && isAlpha c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason why no _
?
@@ -1,6 +1,6 @@ | |||
(Left Type mismatch at () in term | |||
'{ (builtin ifThenElse) (con integer) }'. | |||
Expected type | |||
'(all *_0 (type) *_0)', | |||
'(all `*`_0 (type) `*`_0)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just fix those stupid error messages and remove the dummy
stuff entirely. Could you create a ticket for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the status of this? I just noticed that it was on my list of things to review: I did review it at the time but I think I forgot to click the approve
button. It seems harmless and easy to change if we later decide that we don't like something, so it looks safe to merge (although it's probably got a bit out of sync by now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the status of this?
Oh, it completely slipped my mind. I think it's ready to merge.
@@ -1,6 +1,6 @@ | |||
(Left Type mismatch at () in term | |||
'{ (builtin ifThenElse) (con integer) }'. | |||
Expected type | |||
'(all *_0 (type) *_0)', | |||
'(all `*`_0 (type) `*`_0)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem harmful, and improves readability.