-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zero out memory of uninitialized fields #9147
Comments
Instead of initializing them to zero, we could initialize them to a large, nonsensical value to help catch access of uninitialized fields. |
I guess, but this only applies to pure data types like Int or Float64. Frankly, I'm still not entirely convinced that this is a real usability problem, but unlike arrays, there's no good performance reason not to guarantee zeroed memory. |
The zeroing could take a lot of time if you are inserting lots of things into a large tree or a list, or even an array of composite types. |
Like Stefan, I'm not too worried about performance here. One reason is that uninitialized object fields are relatively rare. One corner case that could cause problems is uninitialized bits fields in immutable types. It's a gotcha if they are zeroed when allocating individual objects, but not when allocating an array of them. Right now we consistently say "bits aren't automatically zeroed". If you like automatic zeroing, you want it everywhere, and doing it sometimes is arguably worse than doing it never. One way out of that corner case is to disallow explicitly-uninitialized bits fields in immutables. Uninitialized references in immutables have uses (e.g. Frankly, I'd rather leave it as-is, or zero everything. For small arrays we can just pay the price, and allocate big arrays directly with mmap. Might not be so bad. |
I'd be in favour of initializing everything. If this turns out to be a bottleneck, as measured in a validated benchmark, then we can see whether introducing |
Regarding zero-ing everything, it wouldn't be too hard to change our |
I would prefer zeroing everything rather only in some places - or use a specific byte pattern. I guess we can start by using calloc and validate through a benchmark as suggested. |
Also this is something we can presumably backport to 0.3 for Ron's class, if it all works out. |
Which would be a pretty drastic semantic change within one version number... Regarding the actual issue: I think it is a good idea to initialize with zero if the performance degradation are negligible. Would still be good to have a flag to get the malloced array if desired. |
It's a safe change though since this is not a behavior anyone could reasonably rely on. |
(@tknopp, you can always call I tend to agree that people shouldn't rely on this behavior and it probably shouldn't even be documented; they should use |
@StefanKarpinski: Indeed. Still, I am not sure if backporting features or semantic language changes is a good idea. Its hard to keep track in which version the feature gets in. Or one might even have to distinguish minor version numbers (e.g. 0.33 and 0.34) when a new feature gets in in 0.34. This then has impact for all packages... @stevengj: While I use |
@tknopp, I'm not saying you shouldn't use the constructor. I'm saying that if a |
I made an experimental branch that uses |
Interesting and tangentially related: http://research.swtch.com/sparse |
Do you want users rely on zero initialization? If yes, best implement and document it so everyone's on the same page. If no, use some nonzero filler like 0b10101010 or just leave it uninitialized like it is today. Facts of life: if you implement zero initialization, users will rely on it, documented or not, whether you want them to or not. Either way, there should be some easy way to get uninitialized memory, like e.g. NumPy has |
@RauliRuohonen in the absence of explicit documentation to the contrary (and even then, not guaranteed), users will default to assuming zero initialization. This is the case in Graphs.jl, where This newbie's vote is for zero-by-default, and the sooner it's implemented, the better. |
I personally would prefer a byte pattern if we were to do this. Also it is quite safe to do this by default, and in the few performance sensitive cases, have a way to get uninitialized memory. |
I'm genuinely curious - why would a byte pattern be preferable to zeros, especially when new pages are supposedly zeroed by the OS by default? |
A byte pattern makes it easier to find uses of uninitialized values. The implication is that people must make sure to manually initialize everything, or else they will get some big useless value which at least makes it easier to find the bug. However, this strikes me as going out of our way to slap people on the wrist. If we are going to put in the effort to guarantee initialization, I'd rather do people a favor and initialize with a likely-useful value (zero). You'd never need to write |
Or, in a worst case, they will get a big useless value that is close enough to an expected value that it slips through, and causes some catastrophic failure down the line? Unless Julia's going to explicitly test and warn on uninitialized values using this byte pattern (thereby voiding any legitimate uses of that particular pattern), I don't see the advantage - and I see two disadvantages: 1), as you said, calloc() provides an optimized zero, and writing a specific byte pattern might result in poorer peformance; and 2) the principle of "do[ing] what is expected" seems to favor zeros. |
I think that either doing what we do now or initializing with zeros and having that be a specified, reliable behavior are the two best options. |
I think initializing to zeros is really the way to go unless there's a serious performance cost. It simplifies the mental model of how memory allocation works and provides a lot more security. |
Proposal: zero-fill by default; provide a named parameter for an option to use "raw" malloc for when performance is über-critical. The security issue is nothing to sneeze at, especially, for example, when building out web services with authenticated sessions. Also, it would make auditing things like Crypto.jl that much more complex. |
@JeffBezanson can I hope that your self-assignment here means that we're making progress on this? :) |
It means, at least, that there seems to be a consensus that we should do this before 1.0. |
It was mentioned a few times that there would not be a performance penalty for zero initialization, but this currently does not seem to be the case (unless I am misunderstanding something): julia> @benchmark zeros(10^6)
BenchmarkTools.Trial:
memory estimate: 7.63 MiB
allocs estimate: 2
--------------
minimum time: 1.458 ms (0.00% GC)
median time: 1.617 ms (0.00% GC)
mean time: 2.135 ms (24.87% GC)
maximum time: 44.558 ms (96.23% GC)
--------------
samples: 2333
evals/sample: 1
julia> @benchmark Vector{Float64}(10^6)
BenchmarkTools.Trial:
memory estimate: 7.63 MiB
allocs estimate: 2
--------------
minimum time: 876.000 ns (0.00% GC)
median time: 2.046 μs (0.00% GC)
mean time: 41.900 μs (84.46% GC)
maximum time: 41.694 ms (99.64% GC)
--------------
samples: 10000
evals/sample: 1 I'm still having trouble understanding why one would ask for uninitialized memory and then be caught by a "bug" due to that. If you want a zeroed out array, call the It seems like most of the trouble mentioned in this thread could be solved by a find-and-replace of |
@jebej, to eliminate the performance penalty, we have to change the underlying implementation to use |
Zero is often a pretty good choice for uninitialized memory, e.g. for integers, floats. Most of the time you end up getting zero anyway. Except when you don't. If zero is an ok value and non-zero is a bad value, now you have a rare, non-deterministic bug. Anyone who's had to track down a rare, non-deterministic bug can tell you that this is the worst kind of bug. (Except for rare, non-deterministic bugs that disappear when you look for them – those are even worse.) In some sense, the "right" solution is to fill uninitialized memory with truly random bytes so that bugs in code that accidentally use uninitialized memory are more reliably forced to happen. However, then you still have a predictability issue since there could still be rare cases where random values are bad. So filling memory with random data doesn't solve the problem either. Moreover, filling memory with random data is really expensive. Filling memory with some fixed value, on the other hand, does solve the problem since if that value is problematic, it's reliably problematic. And if that value is zero, then doing this is essentially free. Finally, other languages (e.g. C#) which have made the choice to fill uninitialized memory with zeros do not seem to have any regrets about this – there are no blog posts calling this a "billion dollar mistake" or complaining about it at all, for that matter. |
This is a non-breaking change, so we can move it off the 1.0 milestone. |
Do we still want to keep this issue open? |
Yes, I still think we should do this. I believe that @JeffBezanson wants to do it as well. |
+1 |
Any progress? Is there a " |
There are a couple steps we can take here. I believe you can ask LLVM to zero things out for you, and we probably have to do something on the side of the GC. Maybe zero on page initialization/ask for zeroed pages, and then zero it again when freeing. |
Raised here: https://groups.google.com/forum/#!topic/julia-users/O5S8pPav5Ks.
The text was updated successfully, but these errors were encountered: