Skip to content

Report syntax error for too-long bin/hex/oct integer literals#11447

Merged
straight-shoota merged 6 commits intocrystal-lang:masterfrom
oprypin:parsnu
Nov 29, 2021
Merged

Report syntax error for too-long bin/hex/oct integer literals#11447
straight-shoota merged 6 commits intocrystal-lang:masterfrom
oprypin:parsnu

Conversation

@oprypin
Copy link
Member

@oprypin oprypin commented Nov 12, 2021

Currently the OverflowError (for any number that doesn't fit into UInt64 as it's being constructed) causes a compiler crash.

Change the operations to bitwise to remove the actual overflow detection but explicitly deduce that an overflow must have happened, through the digit count.

Add the actual number literal to the error message (note: needed to ensure that the literal is fully read before declaring the overflow).


p 0x10000000000000000
Arithmetic overflow (OverflowError)
  from ???

Now:

Error: 0x10000000000000000 doesn't fit in an UInt64

There's something to be said about the fact that the errors are still not unified depending if the number happens to cross the size of UInt64 or not:

p 0x1000000000000i32
Error: 281474976710656 doesn't fit in an Int32

(both before and after, so this is not affected)

Currently the OverflowError causes a compiler crash.

Change the operations to bitwise to remove the actual overflow detection but explicitly deduce that an overflow must have happened, through the digit count.

Add the actual number literal to the error message (note: needed to ensure that the literal is fully read before declaring the overflow).
@HertzDevil HertzDevil added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:parser labels Nov 13, 2021
@straight-shoota
Copy link
Member

It feels odd that an explicitly typed literal such as 0xFFFFFFFFFFFFFFFFFFFFFFFF_i8 would error as "... doesn't fit in an UInt64". That we parse number literals as UInt64 is an implementation detail. Leaking that in the error message is confusing because the type is specified as Int8, not UInt64.

@oprypin
Copy link
Member Author

oprypin commented Nov 14, 2021

@straight-shoota I agree and acknowledge it, but implementing the proper detection is very difficult, so I didn't opt for it. Right now what's being leaked is a compiler error so I figured it's enough of an improvement. Or, well, I wouldn't call it "leaking" even, because it's impossible to rely on syntax error messages.

@straight-shoota
Copy link
Member

straight-shoota commented Nov 14, 2021

Sure, it's certainly an improvement. But why not improve it even more? We don't even need to fix printing the actual literal type in the error message. If that's too hard to do, I'd rather just omit UInt64 as an improvement and just say somthing like 0xFFFFFFFFFFFFFFFFFFFFFFFF is an invalid literal.

@straight-shoota
Copy link
Member

This might actually not be that hard to fix properly, though. I think we could just pass a flag to finish_scan_prefixed_number that specifies whether the literal value is invalid and if it is, that method raises after reading the type suffix.

@oprypin
Copy link
Member Author

oprypin commented Nov 14, 2021

Something to point out is that it's not just the error itself having an arbitrary restriction, literals over UInt64 could work but are arbitrarily not implemented for bin/hex/oct, and this at least gives a related piece of information.

Anyway, I'll think of a full fix some time next week.

@oprypin
Copy link
Member Author

oprypin commented Nov 19, 2021

OK I did that. I had to duplicate the conversion from :i16 to Int16 (etc.) because it's buried so deep.

def finish_scan_prefixed_number(num, negative, start)
def finish_scan_prefixed_number(num : Int?, negative : Bool, start : Int32)
if num.nil? # Doesn't even fit in UInt64
string_value = string_range_from_pool(start).gsub("_", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why modify the literal string with gsub?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no other way to get rid of the underscores that might be there. I needed to get rid of trailing underscore, but figured that in the middle they're also not good. Do you suggest something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you want to get rid of any underscores. I would expect to see the literal exactly as it's written.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, then notice this dichotomy:

p 0x1234_1234_1234_i32
Error: 20014853001780 doesn't fit in an Int32

(that is the case both before and after^)

This is what you're suggesting:

p 0x1234_1234_1234_1234_1234_i32
Error: 0x1234_1234_1234_1234_1234_ doesn't fit in an Int32

This is what I have for the moment

p 0x1234_1234_1234_1234_1234_i32
Error: 0x12341234123412341234 doesn't fit in an Int32

Which do you prefer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also change even the pre-existing case to be like that.
Say,

p 0x1234_1234_1234_i32
Error: 0x1234_1234_1234 doesn't fit in an Int32

But that's an even bigger change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. The first error message is weird and confusing. It presents the literal in a different format than it is written.

I would like to print the literal verbatim:

0x1234_1234_1234_1234_1234_i32 # => Error: 0x1234_1234_1234_1234_1234 doesn't fit in an Int32

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main point is, I intended to not touch the case where the literal happens to fit UInt64. Because that's not what I originally opted to work on.
The example I show is pre-existing, doesn't change with the current state of this PR.

So, then notice this dichotomy:

p 0x1234_1234_1234_i32
Error: 20014853001780 doesn't fit in an Int32

(that is the case both before and after^)

Do you think that should be changed too, as per
#11447 (comment) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would like that to change. It doesn't need to be in this PR, though.
But we should not go in a different direction with the changes that are in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so fixed as per your comments, but not changing the pre-existing error message.

Copy link
Member

@straight-shoota straight-shoota Nov 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's totally fine. Thanks.


# 0o177777_77777777_77777777 is the largest UInt64.
num = nil if num_size > 22 # or > 21 with first digit being 2 through 7
num = nil if {num_size, first_digit} > {22, 0o1}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition looks unconventional and I find it a little harder to read. Maybe this would be a better way to write this?

Suggested change
num = nil if {num_size, first_digit} > {22, 0o1}
num = nil if num_size > 22 || (num_size == 22 && first_digit == 1)

It's fine by me to keep it that way, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well because it's the latest idea of mine, obviously I prefer to keep it 😀
Not a big deal though, others can weigh in as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks dandy, but to be honest I can't say for sure what's happening under the hood without consulting docs on Tuple#<=>, not to mention redundant Tuple allocation. IMO it would be more lightweight and understandable if it was written using the more explicit version :)

Copy link
Member

@sdogruyol sdogruyol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @oprypin 🙏

@straight-shoota straight-shoota added this to the 1.3.0 milestone Nov 26, 2021
@straight-shoota straight-shoota merged commit 6eb0bb1 into crystal-lang:master Nov 29, 2021
BlobCodes added a commit to BlobCodes/crystal that referenced this pull request Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:parser

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants