Report syntax error for too-long bin/hex/oct integer literals by oprypin · Pull Request #11447 · crystal-lang/crystal

oprypin · 2021-11-12T23:02:36Z

Currently the OverflowError (for any number that doesn't fit into UInt64 as it's being constructed) causes a compiler crash.

Change the operations to bitwise to remove the actual overflow detection but explicitly deduce that an overflow must have happened, through the digit count.

Add the actual number literal to the error message (note: needed to ensure that the literal is fully read before declaring the overflow).

p 0x10000000000000000

Arithmetic overflow (OverflowError)
  from ???

Now:

Error: 0x10000000000000000 doesn't fit in an UInt64

There's something to be said about the fact that the errors are still not unified depending if the number happens to cross the size of UInt64 or not:

p 0x1000000000000i32

Error: 281474976710656 doesn't fit in an Int32

(both before and after, so this is not affected)

Currently the OverflowError causes a compiler crash. Change the operations to bitwise to remove the actual overflow detection but explicitly deduce that an overflow must have happened, through the digit count. Add the actual number literal to the error message (note: needed to ensure that the literal is fully read before declaring the overflow).

straight-shoota · 2021-11-14T19:16:40Z

It feels odd that an explicitly typed literal such as 0xFFFFFFFFFFFFFFFFFFFFFFFF_i8 would error as "... doesn't fit in an UInt64". That we parse number literals as UInt64 is an implementation detail. Leaking that in the error message is confusing because the type is specified as Int8, not UInt64.

oprypin · 2021-11-14T19:23:16Z

@straight-shoota I agree and acknowledge it, but implementing the proper detection is very difficult, so I didn't opt for it. Right now what's being leaked is a compiler error so I figured it's enough of an improvement. Or, well, I wouldn't call it "leaking" even, because it's impossible to rely on syntax error messages.

straight-shoota · 2021-11-14T19:46:05Z

Sure, it's certainly an improvement. But why not improve it even more? We don't even need to fix printing the actual literal type in the error message. If that's too hard to do, I'd rather just omit UInt64 as an improvement and just say somthing like 0xFFFFFFFFFFFFFFFFFFFFFFFF is an invalid literal.

straight-shoota · 2021-11-14T19:49:53Z

This might actually not be that hard to fix properly, though. I think we could just pass a flag to finish_scan_prefixed_number that specifies whether the literal value is invalid and if it is, that method raises after reading the type suffix.

oprypin · 2021-11-14T20:45:51Z

Something to point out is that it's not just the error itself having an arbitrary restriction, literals over UInt64 could work but are arbitrarily not implemented for bin/hex/oct, and this at least gives a related piece of information.

Anyway, I'll think of a full fix some time next week.

oprypin · 2021-11-19T15:14:32Z

OK I did that. I had to duplicate the conversion from :i16 to Int16 (etc.) because it's buried so deep.

src/compiler/crystal/syntax/lexer.cr

straight-shoota · 2021-11-19T16:32:58Z

src/compiler/crystal/syntax/lexer.cr

-    def finish_scan_prefixed_number(num, negative, start)
+    def finish_scan_prefixed_number(num : Int?, negative : Bool, start : Int32)
+      if num.nil? # Doesn't even fit in UInt64
+        string_value = string_range_from_pool(start).gsub("_", "")


Why modify the literal string with gsub?

There's no other way to get rid of the underscores that might be there. I needed to get rid of trailing underscore, but figured that in the middle they're also not good. Do you suggest something else?

I don't understand why you want to get rid of any underscores. I would expect to see the literal exactly as it's written.

So, then notice this dichotomy:

p 0x1234_1234_1234_i32 Error: 20014853001780 doesn't fit in an Int32

(that is the case both before and after^)

This is what you're suggesting:

p 0x1234_1234_1234_1234_1234_i32 Error: 0x1234_1234_1234_1234_1234_ doesn't fit in an Int32

This is what I have for the moment

p 0x1234_1234_1234_1234_1234_i32 Error: 0x12341234123412341234 doesn't fit in an Int32

Which do you prefer?

We could also change even the pre-existing case to be like that.
Say,

p 0x1234_1234_1234_i32 Error: 0x1234_1234_1234 doesn't fit in an Int32

But that's an even bigger change.

I don't follow. The first error message is weird and confusing. It presents the literal in a different format than it is written.

I would like to print the literal verbatim:

0x1234_1234_1234_1234_1234_i32 # => Error: 0x1234_1234_1234_1234_1234 doesn't fit in an Int32

My main point is, I intended to not touch the case where the literal happens to fit UInt64. Because that's not what I originally opted to work on.
The example I show is pre-existing, doesn't change with the current state of this PR.

So, then notice this dichotomy:

p 0x1234_1234_1234_i32 Error: 20014853001780 doesn't fit in an Int32

(that is the case both before and after^)

Do you think that should be changed too, as per
#11447 (comment) ?

Yes, I would like that to change. It doesn't need to be in this PR, though.
But we should not go in a different direction with the changes that are in this PR.

OK so fixed as per your comments, but not changing the pre-existing error message.

Yeah, that's totally fine. Thanks.

src/compiler/crystal/syntax/lexer.cr

straight-shoota · 2021-11-20T11:07:51Z

src/compiler/crystal/syntax/lexer.cr


      # 0o177777_77777777_77777777 is the largest UInt64.
-      num = nil if num_size > 22 # or > 21 with first digit being 2 through 7
+      num = nil if {num_size, first_digit} > {22, 0o1}


This condition looks unconventional and I find it a little harder to read. Maybe this would be a better way to write this?

Suggested change

num = nil if {num_size, first_digit} > {22, 0o1}

num = nil if num_size > 22 || (num_size == 22 && first_digit == 1)

It's fine by me to keep it that way, though.

Well because it's the latest idea of mine, obviously I prefer to keep it 😀
Not a big deal though, others can weigh in as well.

Looks dandy, but to be honest I can't say for sure what's happening under the hood without consulting docs on Tuple#<=>, not to mention redundant Tuple allocation. IMO it would be more lightweight and understandable if it was written using the more explicit version :)

sdogruyol

Thank you @oprypin 🙏

HertzDevil added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:parser labels Nov 13, 2021

oprypin added 2 commits November 19, 2021 15:38

Merge remote-tracking branch 'origin/master' into parsnu

85409b5

Take the size suffix into account

a8da279

straight-shoota reviewed Nov 19, 2021

View reviewed changes

src/compiler/crystal/syntax/lexer.cr Outdated Show resolved Hide resolved

oprypin added 3 commits November 19, 2021 18:07

Octal digits go only up to 7...

4883093

Preserve the original literal, with suffix

e28fb57

Refactor the check for first digit

157bf29

straight-shoota approved these changes Nov 20, 2021

View reviewed changes

sdogruyol approved these changes Nov 26, 2021

View reviewed changes

straight-shoota added this to the 1.3.0 milestone Nov 26, 2021

straight-shoota merged commit 6eb0bb1 into crystal-lang:master Nov 29, 2021

BlobCodes mentioned this pull request Nov 30, 2021

Lexer number parsing refactor #11211

Merged

BlobCodes added a commit to BlobCodes/crystal that referenced this pull request Dec 1, 2021

Refactor to apply rules in crystal-lang#11447 and fix bugs

724585c

	num = nil if {num_size, first_digit} > {22, 0o1}
	num = nil if num_size > 22 \|\| (num_size == 22 && first_digit == 1)

Uh oh!

Conversation

oprypin commented Nov 12, 2021

Uh oh!

straight-shoota commented Nov 14, 2021

Uh oh!

oprypin commented Nov 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

straight-shoota commented Nov 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

straight-shoota commented Nov 14, 2021

Uh oh!

oprypin commented Nov 14, 2021

Uh oh!

oprypin commented Nov 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

straight-shoota Nov 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdogruyol left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

oprypin commented Nov 14, 2021 •

edited

Loading

straight-shoota commented Nov 14, 2021 •

edited

Loading

oprypin commented Nov 19, 2021 •

edited

Loading

straight-shoota Nov 20, 2021 •

edited

Loading