-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: spec: change all int types to panic on wraparound, overflow #19624
Comments
This simplifies bounds check elimination, since guarding-against/proving-impossibility-of overflow in indexing calculations can be tricky. |
Just a datapoint, the code in https://golang.org/src/runtime/hash64.go takes advantage of overflow in almost every line of source (for the uintptr type). Maybe we can't do this until Go 2, but we can do experiments to get some data about it now. We could hack panic on overflow into the compiler today. What happens performance-wise with the go1 benchmarks? How many false positives are there? How many true positives are there? |
From a programmer's point of view, proposal #19623 is a significant simplification (wrap around and overflow disappear, and the new One of Go's ideas is to reduce the intrinsic complexity of the programming language at hand while at the same time provide mechanisms that are general and powerful and thus enable programmer productivity and increased readability. We need to look at language changes not from a compiler writer's point of view, but from a programmer's productivity point of view. I think this proposal would be a step backward in the philosophy of Go. I am also not convinced about the claim that overflow checking is "likely much lower than using arbitrary-precision ints": The cost of arbitrary precision ints is there when one actually uses them, otherwise their cost is similar to what needs to be done for bounds/overflow checking (it's the same test, essentially). There's a grey area for ints that use all 64 (or 32) bits as they will become "big ints" internally (at least one bit is usually reserved to implement "tagged ints" efficiently) - but in code that straddles this boundary and if it matters one might be better off using a sized I'm not in favor of this proposal. |
I think either proposal is an improvement over the status quo. Programmers who assert the nonexistence of overflow will write the same code they do today, so no cognitive overhead there, and I won't have to worry about silent errors if they're wrong. |
I believe that the two proposals are compatible (and even complementary). We could make
Could you elaborate on this point? I would expect that most code would either not handle the panic, or use a wrapping integer type explicitly. Even the latter option does not seem like a lot of extra code. |
That is a nice data point, and I think it nicely illustrates the three options I propose for intentional wraparound. Consider this snippet: h := uint64(seed + s*hashkey[0])
tail:
switch {
case s == 0:
case s < 4:
h ^= uint64(*(*byte)(p))
h ^= uint64(*(*byte)(add(p, s>>1))) << 8
h ^= uint64(*(*byte)(add(p, s-1))) << 16
h = rotl_31(h*m1) * m2
case s <= 8:
h ^= uint64(readUnaligned32(p))
h ^= uint64(readUnaligned32(add(p, s-4))) << 32
h = rotl_31(h*m1) * m2
h := uint64(seed + s*hashkey[0])
tail:
switch {
case s == 0:
case s < 4:
h ^= uint64(*(*byte)(p))
x, _ := uint64(*(*byte)(add(p, s>>1))) << 8
h ^= x
x, _ = uint64(*(*byte)(add(p, s-1))) << 16
h ^= x
x, _ = h * m1
h, _ = rotl_31(x) * m2
case s <= 8:
h ^= uint64(readUnaligned32(p))
x, _ := uint64(readUnaligned32(add(p, s-4))) << 32
h ^= x
x, _ = h * m1
h = rotl_31(x) * m2
h := uint64mod(uintptrmod(seed) + uintptrmod(s)*hashkey[0])
tail:
switch {
case s == 0:
case s < 4:
h ^= uint64mod(*(*byte)(p))
h ^= uint64mod(*(*byte)(add(p, s>>1))) << 8
h ^= uint64mod(*(*byte)(add(p, s-1))) << 16
h = rotl_31(h*m1) * m2
case s <= 8:
h ^= uint64mod(readUnaligned32(p))
h ^= uint64mod(readUnaligned32(add(p, s-4))) << 32
h = rotl_31(h*m1) * m2
|
@bcmills The point of the sized integer types is a) to be able to control actual space consumed when laid out in memory, and b) often enough that they do wrap around. There's tons of code that makes use of that. Almost any left-shift operation would become more complicated if there wasn't silent overflow. Thus, a lot of code would have to deal with overflow. |
That's part of my point? At the moment, the sized integer types conflate together (a) and (b). I'm proposing that we make them orthogonal, not eliminate (b).
Perhaps that's a good argument for making the left-shift operator not panic? (None of the other bitwise operators can overflow, and that would make the shift operator somewhat less redundant with multiplication.) |
@bcmills Perhaps. My point is that this all adds extra complexity to the language where I am not convinced that we need more. We need less. Most people couldn't care less about overflow and simply want integers that "just work". |
That's also part of my point? The fact that most people couldn't care less about overflow is what leads to the bugs in the first place. Most people couldn't care less about bounds-checking either, but Go has bounds checks nonetheless. I agree that 'integers that "just work"' is a desirable goal, and ideally I would like to see this proposal combined with an arbitrary-precision |
Is it possible to add this type into go1.9? |
@bronze1man The point of this proposal is to make detection of overflow the default behavior. That's why it's a language proposal and not just a library. |
Regarding cost: overflow-checking is normally one instruction (a conditional branch based on the ALU's overflow flag), and for addition and subtraction my understanding is that modern Intel hardware will fuse the arithmetic instruction and the branch into one µop to be executed on the branch unit. I don't see how we could implement arbitrary-precision integers with any fewer than two additional instructions per op. If we encode tags in the sign bit then every operation needs a shift in, shift out, and mask. If we encode tags in the least-significant bit then every operation needs at least a mask and a branch, and it's not obvious to me that the branch can be fused. |
@bcmills Regarding arbitrary-precision integers: It's probably 2-3 additional instructions in the general case. But there's usually no masking needed in the common case if the tag bits are at the bottom (see #19623 (comment)). |
How do you test the tag bits without masking them? I would expect panic-on-overflow to look like: add %rax, %rdx
jo $overflowPanic with a single Or perhaps, if we need to save the faulting instruction more precisely, something like: add %rax, %rdx
cmovo $0, $overflowPanic and let the If I'm understanding correctly, an arbitrary-precision operation would look like: add %rax, %rdx
test %rax, $0x3
jnz $addSlow and it's not at all obvious to me that we could get by with a small number of variants of |
@bcmills: on x86 at least, we can use the parity (low bit of op result) condition code.
(Those are 3-operand ADD and SUB, we'd need 2 moves also to do this with 2-operand instructions, I think.) How to implement addSlow will be a problem. Because we'd have to spill all live registers around any runtime call, we'd need essentially infinite variations. We'd have to generate them as needed and it would probably be a lot of code. We could use faulting instructions, but those are slow and we'd still need a stack + register map for each. |
Just a dumb experiment - I hacked the compiler to add the following sequence after very
It doesn't do anything, just adds some cruft to simulate overflow checks. It makes the
A few are hurt quite a bit, but a surprising number don't care so much. |
That still seems like a lot of extra instructions compared to what is needed for panic-on-overflow. At any rate, my performance point on this issue is more "the cost of overflow checks is fairly low", with arbitrary-length integers as a reference point for an integer cost that a lot of folks believe to be reasonable. |
@randall77 The problem with using only one bit is that you cannot optimistically add two tagged ints. The problem with using a 1 (instead of a 0) as tag bit is that one has to correct for it each time. Having a 1-offset pointer is trivial to correct when accessing through pointer-indirection. Again, using the scheme I have outlined before, addition is (dst on the right):
If both x and y are tagged ints, they have a 00 tag (least significant 2 bits). The result is already correct. If one or both of them have a 01 tag, the result tags are going to be 01 or 10 - either way its not 00 after masking. In that case we need to run the slow routine. This is 4 instructions per addition in the best case. |
@griesemer , yes, I guess you're trading a bit in the representation for one less instruction. |
With panic-on-overflow, the compiler even cannot fold |
@randall77 As I assume that JSONEncode do not have int overflow bug right now,and JSONEncode/JSONDecode uses 60% of my server's CPUs.😁 I hope golang can do better than that. JSONEncode-8 14.5ms ± 1% 15.6ms ± 0% +8.12% (p=0.008 n=5+5) |
Even without panic on overflow we can't fold |
@bronze1man Do you have reason to think that the cost of JSON encoding is dominated by integer arithmetic? On a modern CPU many algorithms are dominated by the time it takes to load values from memory, and that should not change under this proposal. |
@nathany, here's an interesting analysis of the |
I propose that Go adds For example, the snippet from hash64.go would look like: h := uint64(seed + s*hashkey[0])
tail:
switch {
case s == 0:
case s < 4:
unchecked {
h ^= uint64(*(*byte)(p))
h ^= uint64(*(*byte)(add(p, s>>1))) << 8
h ^= uint64(*(*byte)(add(p, s-1))) << 16
h = rotl_31(h*m1) * m2
}
case s <= 8:
unchecked {
h ^= uint64(readUnaligned32(p))
h ^= uint64(readUnaligned32(add(p, s-4))) << 32
h = rotl_31(h*m1) * m2
} |
@firelizzard18, that's only marginally clearer than the version in #19624 (comment), and costs two keywords and a new block syntax. |
@bcmills Let's agree to disagree. In this particular case, they're not too different. I find the |
I think you have misunderstood the proposal. You would only need one assignment per “expression consisting of arithmetic operators and / or conversions between integer types”, not one per operation: the check applies to the whole expression tree. In the current draft, the only place you have to chop up the expression tree is at function-call boundaries, and even that could perhaps be relaxed. |
If we adopted this change, it would mean that the same program would compile under two different versions of the language, but would behave significantly differently. With my current understanding of language transitions (https://github.com/golang/proposal/blob/master/design/28221-go2-transitions.md) that is a bad idea. So I don't think we can adopt this proposal as is. |
Agreed, this proposal does not comply with that transition plan. I've filed #30209 as an alternative, and I believe that it does. |
Catching up with openshift/installer@dafc79f0 (Generate Network.cluster config instead of NetworkConfig.networkoperator, 2019-01-15, openshift/installer#1013) and openshift/installer@3b393da8 (pkg/types/aws/machinepool: Drop IAM-role overrides, 2019-01-30, openshift/installer#1154). The uint32 -> int32 cast is slightly dangerous, because it will silently wrap overflowing values [1,2]. But I'll try and get the installer updated to use unsigned types as well, and then we won't have to worry about converting. [1]: golang/go#19624 [2]: golang/go#30209
Given the above concerns, I withdraw this proposal in favor of #30209. I'm much happier with the migration strategy for that proposal; thanks for the prompt to reconsider. |
Catching up with openshift/installer@dafc79f0 (Generate Network.cluster config instead of NetworkConfig.networkoperator, 2019-01-15, openshift/installer#1013) and openshift/installer@3b393da8 (pkg/types/aws/machinepool: Drop IAM-role overrides, 2019-01-30, openshift/installer#1154). The uint32 -> int32 cast is slightly dangerous, because it will silently wrap overflowing values [1,2]. But I'll try and get the installer updated to use unsigned types as well, and then we won't have to worry about converting. [1]: golang/go#19624 [2]: golang/go#30209
Catching up with openshift/installer@dafc79f0 (Generate Network.cluster config instead of NetworkConfig.networkoperator, 2019-01-15, openshift/installer#1013), openshift/installer@3b393da8 (pkg/types/aws/machinepool: Drop IAM-role overrides, 2019-01-30, openshift/installer#1154), and openshift/installer@9ad20c35 (pkg/destroy/aws: Remove ClusterName consumer, 2019-01-31, openshift/installer#1170). The uint32 -> int32 cast is slightly dangerous, because it will silently wrap overflowing values [1,2]. But I'll try and get the installer updated to use unsigned types as well, and then we won't have to worry about converting. [1]: golang/go#19624 [2]: golang/go#30209
Catching up with openshift/installer@dafc79f0 (Generate Network.cluster config instead of NetworkConfig.networkoperator, 2019-01-15, openshift/installer#1013), openshift/installer@3b393da8 (pkg/types/aws/machinepool: Drop IAM-role overrides, 2019-01-30, openshift/installer#1154), and openshift/installer@9ad20c35 (pkg/destroy/aws: Remove ClusterName consumer, 2019-01-31, openshift/installer#1170). The uint32 -> int32 cast is slightly dangerous, because it will silently wrap overflowing values [1,2]. But I'll try and get the installer updated to use unsigned types as well, and then we won't have to worry about converting. [1]: golang/go#19624 [2]: golang/go#30209
Catching up with openshift/installer@dafc79f0 (Generate Network.cluster config instead of NetworkConfig.networkoperator, 2019-01-15, openshift/installer#1013), openshift/installer@3b393da8 (pkg/types/aws/machinepool: Drop IAM-role overrides, 2019-01-30, openshift/installer#1154), and openshift/installer@9ad20c35 (pkg/destroy/aws: Remove ClusterName consumer, 2019-01-31, openshift/installer#1170). The uint32 -> int32 cast is slightly dangerous, because it will silently wrap overflowing values [1,2]. But I'll try and get the installer updated to use unsigned types as well, and then we won't have to worry about converting. [1]: golang/go#19624 [2]: golang/go#30209
Catching up with openshift/installer@dafc79f0 (Generate Network.cluster config instead of NetworkConfig.networkoperator, 2019-01-15, openshift/installer#1013), openshift/installer@3b393da8 (pkg/types/aws/machinepool: Drop IAM-role overrides, 2019-01-30, openshift/installer#1154), and openshift/installer@9ad20c35 (pkg/destroy/aws: Remove ClusterName consumer, 2019-01-31, openshift/installer#1170). The uint32 -> int32 cast is slightly dangerous, because it will silently wrap overflowing values [1,2]. But I'll try and get the installer updated to use unsigned types as well, and then we won't have to worry about converting. [1]: golang/go#19624 [2]: golang/go#30209
Catching up with openshift/installer@dafc79f0 (Generate Network.cluster config instead of NetworkConfig.networkoperator, 2019-01-15, openshift/installer#1013), openshift/installer@3b393da8 (pkg/types/aws/machinepool: Drop IAM-role overrides, 2019-01-30, openshift/installer#1154), and openshift/installer@9ad20c35 (pkg/destroy/aws: Remove ClusterName consumer, 2019-01-31, openshift/installer#1170). The uint32 -> int32 cast is slightly dangerous, because it will silently wrap overflowing values [1,2]. But I'll try and get the installer updated to use unsigned types as well, and then we won't have to worry about converting. [1]: golang/go#19624 [2]: golang/go#30209
Catching up with openshift/installer@dafc79f0 (Generate Network.cluster config instead of NetworkConfig.networkoperator, 2019-01-15, openshift/installer#1013), openshift/installer@3b393da8 (pkg/types/aws/machinepool: Drop IAM-role overrides, 2019-01-30, openshift/installer#1154), and openshift/installer@9ad20c35 (pkg/destroy/aws: Remove ClusterName consumer, 2019-01-31, openshift/installer#1170). The uint32 -> int32 cast is slightly dangerous, because it will silently wrap overflowing values [1,2]. But I'll try and get the installer updated to use unsigned types as well, and then we won't have to worry about converting. [1]: golang/go#19624 [2]: golang/go#30209
I know this has been discussed before, but I didn't see a specific proposal filed for it yet and I think it's important.
Unexpected integer overflow can lead to serious bugs, including bugs in Go itself. Go's bounds-checking on slices and arrays mitigates some of the harmful effects of overflow, but not all of them. For example, programs that make system calls may pass data structures into the kernel, bypassing Go's usual bounds checks. Programs that marshal data-structures to be sent over the wire (such as protocol buffers) may send silently-corrupted data instead of returning errors as they ought to. And programs that use
unsafe
to access addresses with offsets are vulnerable to exactly the same overflow bugs as in C.In my experience, Go programs and libraries are often written assuming "reasonable inputs" and no overflow. For such programs, it would be clearer for overflow to cause a run-time panic (similar to dividing by zero) rather than silently wrapping around. Even in the case where the unintended overflow is subsequently caught by a slice bounds check, reporting the error at the overflowing operation rather than the slice access would make the source of the bug easier to diagnose.
The potential performance impact of this proposal is similar to bounds-checking in general, and likely lower than using arbitrary-precision ints (#19623). The checks can be omitted when the compiler can prove the result is within bounds, any new branches will be trivially predictable (they'll occupy some CPU resources in the branch-predictor but otherwise add little overhead), and in some cases the checks might be able to use bounds-check instructions or other hardware traps.
For the subset of programs and libraries that intentionally make use of wraparound, we could provide one of several alternatives:
or "comma, carry" forms (proposal: spec: extend comma-ok expressions to + - * / arithmetic #6815)that ignore overflow panics, analogous to how the "comma, ok" form of a type-assertion ignores the panic from a mismatched type.int32wrap
orint32mod
.uint32
and friends), since they're used for bit-manipulation code more often than the signed equivalents.Those alternatives could also be used to optimize out the overflow checks in inner-loop code when the programmer has already validated the inputs by some other means.
[Edit: added this section in response to comments.]
Concretely, the proposed changes to the spec are:
Integer operators
For two integer values
x
andy
, the integer quotientq = x / y
and remainderr = x % y
satisfy the following relationships:[…]
As an exception to this rule, if the dividendx
is the most negative value for the int type ofx
, the quotientq = x / -1
is equal tox
(andr = 0
).[…]
The shift operators shift the left operand by the shift count specified by the right operand. They implement arithmetic shifts if the left operand is a signed integer and logical shifts if it is an unsigned integer. The result of a logical shift is truncated to the bit width of the type: a logical shift never results in overflow. Shifts behave as if the left operand is shifted n times by 1 for a shift count of n. As a result, x << 1 is the same as x*2 and x >> 1 is the same as x/2 but truncated towards negative infinity.
[…]
Integer overflow
If the result of any arithmetic operator or conversion to an integer type cannot be represented in the type, a run-time panic occurs.
An expression consisting of arithmetic operators and / or conversions between integer types used in an assignment or initialization of the special form
yields an additional untyped boolean value. The value of
ok
istrue
if the results of all arithmetic operators and conversions could be represented in their respective types. Otherwise it isfalse
and the value ofv
is computed as follows. No run-time panic occurs in this case.For unsigned integer values, the operations +, -, *, and << are computed modulo 2ⁿ upon overflow, where n is the bit width of the unsigned integer's type. Loosely speaking, these unsigned integer operations discard high bits
upon overflow, and programs may rely on ``wrap around''.For signed integers, the operations +, -, *, and << are computed using two's complement arithmetic and truncated to the bit width of the signed integer's type upon overflow.
No exception is raised as a result of overflow. A compiler may not optimize code under the assumption that overflow does not occur. For instance, it may not assume that x < x + 1 is always true.If the dividend
x
of a quotient or remainder operation is the most negative value for the int type ofx
, evaluation ofx / -1
overflows and its result upon overflow is equal tox
. In contrast, evaluation ofx % -1
does not overflow and yields a result of0
.[…]
Conversions between numeric types
For the conversion of non-constant numeric values, the following rules apply:
v := uint16(0x10F0)
, thenw, _ := uint32(int8(v))
results inw == 0xFFFFFFF0
.This proposal is obviously not compatible with Go 1, but I think we should seriously consider it for Go 2.
The text was updated successfully, but these errors were encountered: