Int128 Parsing support [PART 1]#11196
Int128 Parsing support [PART 1]#11196BlobCodes wants to merge 18 commits intocrystal-lang:masterfrom
Conversation
| # Specs ported from compiler-rt | ||
|
|
||
| private def test__divti3(a : Int128, b : Int128, expected : Int128, file = __FILE__, line = __LINE__) | ||
| it "passes compiler-rt builtins unit tests" do |
There was a problem hiding this comment.
I know this is the same design as in the existing mulodi4_spec.cr, so don't feel obliged to change this. But I don't think there's much reason for having many individual examples (with non-descriptive names). I'd propose to place all related expecations in a single example.
There was a problem hiding this comment.
Placing it all in the one example will show just the first fail instead of all.
There was a problem hiding this comment.
Sure, but that doesn't really matter much. They're all expected to pass anyways.
There was a problem hiding this comment.
Of course they are :D I'm saying is that you want to know exactly which ones are failing if they don't.
Include U/Int128 popcount spec Co-authored-by: Johannes Müller <straightshoota@gmail.com>
|
The symbols missing on win32 are part of compiler-rt but libgcc provides them es well, meaning they are always available as low-level implementations when linking with |
|
I found a hack to return the lower64 bytes of a 128 integer in the compiler-rt funs. |
|
Finally, all checks passed! |
|
Hmm.. I just found a breaking change.. Now I can either introduce a breaking change (which is probably not wanted), use BTW: Why is there no compiler spec for this? |
|
I don't know why |
The String#to_i implementation has a check built in just for this. Line 595 in a067d06 It should be very easy to change this behaviour. |
I'm having a hard time thinking up a use case where multiple _ would be wanted, so why allow it? However, it fails in more cases: OTOH, prints . |
puts "1_1".to_i(underscore: true) #=> 11I think it's okay to allow multiple sequential underscores, because you have to manually allow it anyways |
|
I don't think we should allow multiple consecutive underscores in numbers. If that's the case right now, it's a bug. |
|
Btw there's another thing about current integer parsing which I think is strange: 012 #=> Error: octal constants should be prefixed with 0o
0_12 #=> 12Is this also a bug or should this stay like it is? |
|
Or this bug in number parsing: -0_u64 #=> Invalid UInt32: -0 (ArgumentError); you've found a bug in the Crystal compiler.
-0u64 #=> 0 |
|
I've extracted the underscore discussion to #11203 to stay focused on the PR here. The examples in the previous two posts all look like legitimate bugs to me. If this PR happens to fix them, that sounds good to me. |
|
Hi @BlobCodes, thanks for the hard work you put into this! If I may ask one more thing from you, would it be possible to break this PR even more, so we can look closely each bit? I'd say one PR for every bullet point in the description would be optimal. I can try to do it myself if you prefer. 🙏 |
|
This new commit completely refactors number parsing in the lexer. The lexer.cr in this PR is now 331 LOC lighter than the lexer.cr in master. All bugs mentioned above have been fixed. Some new rules were created: # Before
1_.1 #=> 1.1
1_e2 #=> 100.0
-0u64 #=> 0_u64
-0_u64 #=> Invalid UInt32: -0 (ArgumentError); you've found a bug in the Crystal compiler.
1__2 #=> 12
0x_2 #=> 2
0_12 #=> 12
# After
1_.1 #=> Error: trailing '_' in number
1_e2 #=> Error: trailing '_' in number
-0u64 #=> Error: Invalid negative value -0 for UInt64
-0_u64 #=> Error: Invalid negative value -0 for UInt64
1__2 #=> Error: trailing '_' in number
0x_2 #=> Error: numeric literal without digits
0_12 #=> Error: octal constants should be prefixed with 0oThe two new rules (and error messages) were taken from ruby. |
Alright, I'll split it up. |
|
Let's do one after the other. That's cleaner. |
I'm not sure these changes are actually correct. IMO underscores should be allowed at any place in a number literal, including around decimal separator and literal base prefix. I don't think there is any harm in doing that, but it could allow more versatile use cases. I see no reason to put unnecessary restrictions if the intention is clear and unambiguous. In Rust, all those literals are valid except for the unsigned |
Hmm.. In ruby, those are all throwing errors. |
|
( I don't see any convincing argument why the literals If somebody wants to propose such a change, they should start a dedicated discussion about that. But let's not mingle it with 128-bit support. |
|
They certainly look like errors, are super-rarely used - if at all, are undocumented, so I don't get why shouldn't they be made errors - which they are - according to the non-existing specs and documentation - i.e. they were never designed to work that way. |
Hmm.. yeah, that's fair. But allowing int128 parsing requires a lexer refactor anyways, so I think there should be a discussion about this (because it's not that much added work). |
"including around decimal separator" With String.to_i already raising when an integer has multiple underscores, I think this should be considered an error in the lexer too (Ary even called it a bug).
I can kind of understand that you think Anyways, the lexer refactor has been seperated into #11211 - let's continue the discussions there |
|
Is this PR still usable or was it superseded? Should it be closed? |
There are still some changes in this PR that are not included anywhere else. Also, #11211 includes some bug fixes to integer parsing and some opinionated changes which still need to be discussed in #11203 and #11214 |
|
Replaced by #11571 |
This PR adds:
With this you can do crazy things like:
This is a subset of #11111 only including those changes not making the CI fail on crystal v1.1.1.
TODO:
As far as I have read now, the int128 methods on MSVC require the use of SIMD (result expected to be placed in SSE register XMM0) - which crystal sadly does not explicitly support.
To not introduce breaking changes, the method
deduce_integer_kindnow uses the following priority:Int32 > Int64 > UInt64 > Int128 > UInt128. I don't know if this should be changed.Requires/Includes #11093
Related to #8373
Supersedes #10975
Closes #9516
Closes #7915
Related to #5545
Closes #11191