Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display better snippet for invalid char literal #30763

Merged
merged 1 commit into from
Jan 15, 2016

Conversation

gchp
Copy link
Contributor

@gchp gchp commented Jan 7, 2016

This is achieved by adding the scan_back method. This method looks back
through the source_text of the StringReader until it finds the target
char, returning it's offset in the source. We use this method to find
the offset of the opening single quote, and use that offset as the start
of the error.

Given this code:

fn main() {
    let _ = 'abcd';
}

The compiler would give a message like:

error: character literal may only contain one codepoint: ';
let _ = 'abcd';
             ^~

With this change, the message now displays:

error: character literal may only contain one codepoint: 'abcd';
let _ = 'abcd';
        ^~~~~~~

Fixes #30033

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@apasel422
Copy link
Contributor

Ideally, the span would not include the ; there. Also, can you add a compile-fail test for this?

@gchp
Copy link
Contributor Author

gchp commented Jan 7, 2016

@apasel422 ok - sure thing.

@nagisa
Copy link
Member

nagisa commented Jan 7, 2016

To me it seems we should investigate why start does not contain the correct position, and if it is not possible to fix start, add a comment explaining why.

@gchp
Copy link
Contributor Author

gchp commented Jan 10, 2016

Investigating the failing test now.

@nagisa if I understand it correctly, start does not contain the correct position because we have already parsed past the opening single quote trying to look for a closing single quite (which in this case is not found). So I think it makes sense that start doesn't point to the opening single quote position?

@alexcrichton
Copy link
Member

I agree with @nagisa that the start variable should indeed be the start of the character constant, and in theory we shouldn't need to scan backwards looking for the end (e.g. we just parsed it and should have the position as well).

Would it be possible to fix the span without the scan backwards function?

@gchp
Copy link
Contributor Author

gchp commented Jan 12, 2016

@alexcrichton sure thing. I'll dig into it and update the PR!

@nagisa
Copy link
Member

nagisa commented Jan 12, 2016

So I think it makes sense that start doesn't point to the opening single quote position?

You could introduce another variable earlier in the function (or rename the start which gets set to the closing apostrophe) to deal with it, I think.

@gchp
Copy link
Contributor Author

gchp commented Jan 12, 2016

@nagisa I did try that before I ended up with the current iteration. Even at the very beginning of the function the start variable was not in the right place.

I might have just overlooked something though. I'll go back and try it again and see what I find. Will update here when I have something to report.

Thanks for the feedback!

@gchp
Copy link
Contributor Author

gchp commented Jan 13, 2016

@nagisa & @alexcrichton - I've fixed the problem and updated the PR. See the latest commit message for details on the fix. Basically we were exiting the function too early, without checking for a closing single quote.

The latest commit makes the others which preceeded it unecessary. Should I squash them all together? Or is it ok to leave as is?

@bors
Copy link
Contributor

bors commented Jan 13, 2016

☔ The latest upstream changes (presumably #30684) made this pull request unmergeable. Please resolve the merge conflicts.

@@ -1 +1 @@
Subproject commit 95d6a00134f284e6b889d98f4c2cb4b285950327
Subproject commit e0c0bf439add63a6a25a25ba47e4aec9547bf9af
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This showed up when I rebased. Not sure how to make it not show up as part of this pull request...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cd src/liblibc
git checkout 95d6a00134f284e6b889d98f4c2cb4b285950327
cd ../../
git add/commit --amend

is the crudest way which should work, I think.

@gchp
Copy link
Contributor Author

gchp commented Jan 14, 2016

Ok, updated from comments. Let me know what other tests you'd like, if any.
Also let me know if I should squash all these commits.

Thanks!

@nagisa
Copy link
Member

nagisa commented Jan 14, 2016

Yeah, a squash is necessary. Perhaps also a test for '\x00\x00'. r=me once those are done.

@gchp
Copy link
Contributor Author

gchp commented Jan 14, 2016

@nagisa having some issues getting the error right for '\x00\x00. As it stands, it comes out like: ...one codepoint: '\x00\x. Missing the end of the second sequence. I'm not sure how to reliably fix that.

I could try parse forwards until the end of the escape sequence? I'd need to do that any number of times, though depending on the number of sequences found. Take this example:

'\x00\x00\x00\x00'

Should I just recursively parse escape sequences?

@nagisa
Copy link
Member

nagisa commented Jan 14, 2016

My suggestion would be to not parse until the end – you already know what’s the error and where it is happening, you just need to ensure the error is well worded. Current nightly outputs something like:

<anon>:2:5: 2:10 error: unterminated character constant: '\x00
<anon>:2     '\x00\x00'
             ^~~~~

It might be best to fall-back on error of this sort if you can’t immediately tell whether the literal is not terminated (i.e. missing ') or it is too long (i.e. too many characters before the closing ').


EDIT: it could also be that we do not really want the “literal may only contain one codepoint” error and would rather always report the “unterminated character constant” instead and print an associated help message. For example:

<anon>:2:5: 2:7 error: unterminated character constant: 'h
<anon>:2     'hello'
             ^~
<anon>:2:5: 2:10 help: character literals may only contain one codepoint

… at least for the non-lifetime case.

@gchp
Copy link
Contributor Author

gchp commented Jan 14, 2016

@nagisa ok, sounds good. Will update the PR shortly. Thanks!

Given this code:

    fn main() {
        let _ = 'abcd';
    }

The compiler would give a message like:

    error: character literal may only contain one codepoint: ';
    let _ = 'abcd';
                 ^~

With this change, the message now displays:

    error: character literal may only contain one codepoint: 'abcd'
    let _ = 'abcd'
            ^~~~~~

Fixes rust-lang#30033
@nagisa
Copy link
Member

nagisa commented Jan 14, 2016

@bors r+

@bors
Copy link
Contributor

bors commented Jan 14, 2016

📌 Commit acc9428 has been approved by nagisa

@nagisa
Copy link
Member

nagisa commented Jan 14, 2016

Thanks, the PR turned out nicely!

@gchp
Copy link
Contributor Author

gchp commented Jan 14, 2016

Thank you! Appreciate you going through it with me :)

@bors
Copy link
Contributor

bors commented Jan 15, 2016

⌛ Testing commit acc9428 with merge 0cc6f21...

@bors
Copy link
Contributor

bors commented Jan 15, 2016

💔 Test failed - auto-mac-64-nopt-t

@nagisa
Copy link
Member

nagisa commented Jan 15, 2016

@bors retry

Seems like its one of the spurious errors we've been having lately.
On Jan 15, 2016 3:52 AM, "bors" [email protected] wrote:

[image: 💔] Test failed - auto-mac-64-nopt-t
http://buildbot.rust-lang.org/builders/auto-mac-64-nopt-t/builds/7711


Reply to this email directly or view it on GitHub
#30763 (comment).

@bors
Copy link
Contributor

bors commented Jan 15, 2016

⌛ Testing commit acc9428 with merge a70a60a...

bors added a commit that referenced this pull request Jan 15, 2016
This is achieved by adding the scan_back method. This method looks back
through the source_text of the StringReader until it finds the target
char, returning it's offset in the source. We use this method to find
the offset of the opening single quote, and use that offset as the start
of the error.

Given this code:

```rust
fn main() {
    let _ = 'abcd';
}
```

The compiler would give a message like:

```
error: character literal may only contain one codepoint: ';
let _ = 'abcd';
             ^~
```
With this change, the message now displays:

```
error: character literal may only contain one codepoint: 'abcd';
let _ = 'abcd';
        ^~~~~~~
```

Fixes #30033
@bors bors merged commit acc9428 into rust-lang:master Jan 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants