Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change handling of “blank” identifiers #5282

Merged
merged 3 commits into from
Aug 18, 2024

Conversation

sellout
Copy link
Contributor

@sellout sellout commented Aug 16, 2024

Overview

The proximate bug was that the lexer for Blank would “steal” the first segment of an identifier. E.g., _a.blah would be lexed as [Blank "a", WordyId ".blah"], rather than [WordyId "_a.blah"]. However, fixing that directly was fragile, and left other cases like the one in #4681.

Also, most uses of Blank exactly mirrored WordyId, often rebuilding the original segment.

This PR removes the Blank token, instead treating it as any other WordyId. The parser now checks identifiers for “blankness” as needed.

Fixes #2822.

Implementation notes

There were two places that treated Blank differently than WordyId, and those are preserved. There were also two places where a “true” Blank (_) was treated differently than a suffixedBlank (_withSomeSuffix), and those have been eliminated.

I considered storing “blankness” in an extra field of WordyId, but that suffers from boolean blindness and also means that it’s possible to create inconsistent terms (e.g., WordyId IsBlank "definitely.not.blank").

Test coverage

There is a new transcript with examples pulled from both #2822 and #4681, as well as other coverage I thought was useful.

Previously, they were tokenized separately from other identifiers, but
then most handling checked both tokens anyway. This now always parses
“blanks” as normal identifiers and checks their blankness at the few
places we care about it.

There were two places that treated `Blank` differently than `WordyId`,
and those are preserved. There were also two places where `Blank ""`
(`_`) was treated differently than `Blank n` (`_withSomeSuffix`), and
those have been eliminated.

Fixes unisonweb#2822.
This makes it much easier to read the output when debugging the lexer.
And it should be `Read`-compatible..

There’s still room for improvement, though:
```haskell
Block (Open "scratch.u")
  [
    [
      Leaf (WordyId (NameOnly (Name Relative (NameSegment {toUnescapedText = "dontMap"} :| [])))),
      Leaf (WordyId (NameOnly (Name Relative (NameSegment {toUnescapedText = "f"} :| [])))),
      Block (Open "=")
        [
          [
            Block (Open "cases")
              [
                [
                  Leaf (WordyId (NameOnly (Name Relative (NameSegment {toUnescapedText = "None"} :| [])))),
                  Block (Open "->")
                    [
                      [
                        Leaf (Reserved "false"),
                      ],
                    ]
                    (Just Close),
                  Leaf (Semi True),
                ],
                [
                  Leaf (WordyId (NameOnly (Name Relative (NameSegment {toUnescapedText = "Some"} :| [])))),
                  Leaf (WordyId (NameOnly (Name Relative (NameSegment {toUnescapedText = "_unused"} :| [])))),
                  Block (Open "->")
                    [
                      [
                        Leaf (WordyId (NameOnly (Name Relative (NameSegment {toUnescapedText = "f"} :| [])))),
                        Leaf (Numeric "2"),
                      ],
                    ]
                    (Just Close),
                ],
              ]
              (Just Close),
          ],
        ]
        (Just Close),
    ],
  ]
  (Just Close)
```
Comment on lines +107 to +110
I couldn't figure out what _used refers to here:

3 | Some _used -> f _used

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could give a hint here, when an unknown identifier matches an unbound pattern var, but it doesn't have to be in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. I don’t think it should be in this PR, because it’s not a parser issue, and this behavior already existed.

And this is another case where using color isn’t enough to convey the error. It should be more like

      3 |   Some _used -> f _used
                            ^^^^^

(So, another vote for integrating Diagnose.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(So, another vote for integrating Diagnose.)

Whoa that looks crazy. I'm interested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long ago we had started implementing something like what you showed, but we weren't good at it and we scrapped it. But off-the-shelf sounds nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give credit where it‘s due, @ChrisPenner is the one who shared it initially (https://discord.com/channels/862108724948500490/1200119435931942994/1261743434704879796).

@aryairani aryairani merged commit e388786 into unisonweb:trunk Aug 18, 2024
20 checks passed
@sellout sellout deleted the fix-blank-identifiers branch August 19, 2024 04:53
@ChrisPenner
Copy link
Contributor

Ooh, does this now allow _ as a type-hole? If so I'm stoked on it 🙌🏼

@sellout
Copy link
Contributor Author

sellout commented Aug 20, 2024

Ooh, does this now allow _ as a type-hole? If so I'm stoked on it 🙌🏼

If you mean

  I couldn't figure out what _ refers to here:
  
      4 | > x _ +3
  
  I think its type should be:
  
      Int

instead of

  I got confused here:
  
      4 | > x _ +3
  
  
  I was surprised to find a  here.
  I was expecting one of these instead:
  
  * and
  * bang
  * …

then yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inability to reference a term or type with a name that has segments starting with an underscore
3 participants