zero out memory of uninitialized fields #9147

StefanKarpinski · 2014-11-25T15:34:28Z

Raised here: https://groups.google.com/forum/#!topic/julia-users/O5S8pPav5Ks.

eschnett · 2014-11-25T15:49:37Z

Instead of initializing them to zero, we could initialize them to a large, nonsensical value to help catch access of uninitialized fields.

StefanKarpinski · 2014-11-25T15:54:18Z

I guess, but this only applies to pure data types like Int or Float64. Frankly, I'm still not entirely convinced that this is a real usability problem, but unlike arrays, there's no good performance reason not to guarantee zeroed memory.

ViralBShah · 2014-11-25T17:12:35Z

The zeroing could take a lot of time if you are inserting lots of things into a large tree or a list, or even an array of composite types.

JeffBezanson · 2014-11-25T17:42:40Z

Like Stefan, I'm not too worried about performance here. One reason is that uninitialized object fields are relatively rare. new calls that pass all arguments would be unaffected. LLVM might also be able to remove the extra store in x = new(); x.fld = 1. And if the object is heap allocated, the overhead of an extra store would be comparatively small.

One corner case that could cause problems is uninitialized bits fields in immutable types. It's a gotcha if they are zeroed when allocating individual objects, but not when allocating an array of them. Right now we consistently say "bits aren't automatically zeroed". If you like automatic zeroing, you want it everywhere, and doing it sometimes is arguably worse than doing it never.

One way out of that corner case is to disallow explicitly-uninitialized bits fields in immutables. Uninitialized references in immutables have uses (e.g. Nullable{BigMutableThing}), but uninitialized bits fields are less reasonable.

Frankly, I'd rather leave it as-is, or zero everything. For small arrays we can just pay the price, and allocate big arrays directly with mmap. Might not be so bad.

eschnett · 2014-11-25T18:36:02Z

I'd be in favour of initializing everything. If this turns out to be a bottleneck, as measured in a validated benchmark, then we can see whether introducing ccall(:jl_allocate_uninitialized_array) for a few special cases wouldn't do the trick.

stevengj · 2014-11-25T23:58:50Z

Regarding zero-ing everything, it wouldn't be too hard to change our malloc_a16 function (in gc.c, which is used to allocate arrays) to a calloc_a16 function, which called calloc, shifts the pointer, and stores the original pointer before the pointed-to data. This is how the _aligned_malloc function works on Windows, and how we defined a 16-byte (or 32-byte) aligned malloc for FFTW (which is so trivial I don't think relicensing would be an issue).

ViralBShah · 2014-11-26T02:47:04Z

I would prefer zeroing everything rather only in some places - or use a specific byte pattern. I guess we can start by using calloc and validate through a benchmark as suggested.

ViralBShah · 2014-11-26T02:48:31Z

Also this is something we can presumably backport to 0.3 for Ron's class, if it all works out.

tknopp · 2014-11-26T21:09:38Z

Which would be a pretty drastic semantic change within one version number...

Regarding the actual issue: I think it is a good idea to initialize with zero if the performance degradation are negligible. Would still be good to have a flag to get the malloced array if desired.

StefanKarpinski · 2014-11-26T21:21:08Z

Which would be a pretty drastic semantic change within one version number...

It's a safe change though since this is not a behavior anyone could reasonably rely on.

stevengj · 2014-11-26T22:22:16Z

(@tknopp, you can always call pointer_to_array(convert(Ptr{T}, c_malloc(sizeof(T) * n)), n, true) or similar to get a malloced array, so I don't think we necessarily need a flag. Assuming the overhead of calloc is normally negligible, anyone needing an uninitialized array will be working at such a low level that calling c_malloc won't be unreasonable.)

I tend to agree that people shouldn't rely on this behavior and it probably shouldn't even be documented; they should use zeros(...) if they want guaranteed zero initialization. (Of course, the implementation of zeros in Base can take advantage of it.)

tknopp · 2014-11-27T23:45:16Z

@StefanKarpinski: Indeed. Still, I am not sure if backporting features or semantic language changes is a good idea. Its hard to keep track in which version the feature gets in. Or one might even have to distinguish minor version numbers (e.g. 0.33 and 0.34) when a new feature gets in in 0.34. This then has impact for all packages...

@stevengj: While I use ones and zeros myself when initializing an array I think the Array constructor should be a valid way to initialize an array. Currently I am not using it because I want zero initialization. If the constructor would initialize with zero, it would be IMHO the more logical way to create an array. For every other datastructure I also use the constructor.

stevengj · 2014-11-28T00:41:12Z

@tknopp, I'm not saying you shouldn't use the constructor. I'm saying that if a calloc version is fast enough then we need not provide a high-level uninitialized-array constructor (nor "a flag" for this).

stevengj · 2014-11-28T19:44:40Z

I made an experimental branch that uses calloc instead of malloc, and so far I haven't been able to detect any performance differences (all the differences are swamped by the measurement noise) on MacOS.

StefanKarpinski · 2014-12-07T06:26:08Z

Interesting and tangentially related: http://research.swtch.com/sparse

RauliRuohonen · 2014-12-20T19:33:28Z

Do you want users rely on zero initialization? If yes, best implement and document it so everyone's on the same page. If no, use some nonzero filler like 0b10101010 or just leave it uninitialized like it is today. Facts of life: if you implement zero initialization, users will rely on it, documented or not, whether you want them to or not. Either way, there should be some easy way to get uninitialized memory, like e.g. NumPy has empty() in addition to zeros() and ones() which you can use when you want performance.

sbromberger · 2015-01-06T02:50:18Z

@RauliRuohonen in the absence of explicit documentation to the contrary (and even then, not guaranteed), users will default to assuming zero initialization. This is the case in Graphs.jl, where dijkstra_shortest_paths can return uninitialized memory (see JuliaAttic/OldGraphs.jl#140 for an example).

This newbie's vote is for zero-by-default, and the sooner it's implemented, the better.

ViralBShah · 2015-01-06T04:39:59Z

I personally would prefer a byte pattern if we were to do this.

Also it is quite safe to do this by default, and in the few performance sensitive cases, have a way to get uninitialized memory.

sbromberger · 2015-01-06T04:42:49Z

I personally would prefer a byte pattern if we were to do this.

I'm genuinely curious - why would a byte pattern be preferable to zeros, especially when new pages are supposedly zeroed by the OS by default?

JeffBezanson · 2015-01-06T04:57:29Z

A byte pattern makes it easier to find uses of uninitialized values. The implication is that people must make sure to manually initialize everything, or else they will get some big useless value which at least makes it easier to find the bug.

However, this strikes me as going out of our way to slap people on the wrist. If we are going to put in the effort to guarantee initialization, I'd rather do people a favor and initialize with a likely-useful value (zero). You'd never need to write Foo(x,y) = new(0,0). And given calloc, there might be a performance advantage.

sbromberger · 2015-01-06T05:03:11Z

they will get some big useless value which at least makes it easier to find the bug

Or, in a worst case, they will get a big useless value that is close enough to an expected value that it slips through, and causes some catastrophic failure down the line?

Unless Julia's going to explicitly test and warn on uninitialized values using this byte pattern (thereby voiding any legitimate uses of that particular pattern), I don't see the advantage - and I see two disadvantages: 1), as you said, calloc() provides an optimized zero, and writing a specific byte pattern might result in poorer peformance; and 2) the principle of "do[ing] what is expected" seems to favor zeros.

StefanKarpinski · 2015-01-07T21:17:35Z

I think that either doing what we do now or initializing with zeros and having that be a specified, reliable behavior are the two best options.

johnmyleswhite · 2015-01-07T21:20:02Z

I think initializing to zeros is really the way to go unless there's a serious performance cost. It simplifies the mental model of how memory allocation works and provides a lot more security.

sbromberger · 2015-01-07T21:23:32Z

Proposal: zero-fill by default; provide a named parameter for an option to use "raw" malloc for when performance is über-critical.

The security issue is nothing to sneeze at, especially, for example, when building out web services with authenticated sessions. Also, it would make auditing things like Crypto.jl that much more complex.

sbromberger · 2017-07-03T18:55:38Z

@JeffBezanson can I hope that your self-assignment here means that we're making progress on this? :)

JeffBezanson · 2017-07-03T19:34:01Z

It means, at least, that there seems to be a consensus that we should do this before 1.0.

jebej · 2017-07-03T19:54:52Z

It was mentioned a few times that there would not be a performance penalty for zero initialization, but this currently does not seem to be the case (unless I am misunderstanding something):

julia> @benchmark zeros(10^6)
BenchmarkTools.Trial:
  memory estimate:  7.63 MiB
  allocs estimate:  2
  --------------
  minimum time:     1.458 ms (0.00% GC)
  median time:      1.617 ms (0.00% GC)
  mean time:        2.135 ms (24.87% GC)
  maximum time:     44.558 ms (96.23% GC)
  --------------
  samples:          2333
  evals/sample:     1

julia> @benchmark Vector{Float64}(10^6)
BenchmarkTools.Trial:
  memory estimate:  7.63 MiB
  allocs estimate:  2
  --------------
  minimum time:     876.000 ns (0.00% GC)
  median time:      2.046 μs (0.00% GC)
  mean time:        41.900 μs (84.46% GC)
  maximum time:     41.694 ms (99.64% GC)
  --------------
  samples:          10000
  evals/sample:     1

I'm still having trouble understanding why one would ask for uninitialized memory and then be caught by a "bug" due to that. If you want a zeroed out array, call the zeros function. There is no good reason why 0 is a better "uninitialized" value than any other, and as mentioned above there are cases where 0 does not make much sense (e.g. for a rational 0//0).

It seems like most of the trouble mentioned in this thread could be solved by a find-and-replace of Array with zeros...

stevengj · 2017-07-03T20:03:34Z

@jebej, to eliminate the performance penalty, we have to change the underlying implementation to use calloc. The current zeros will definitely be slower.

StefanKarpinski · 2017-07-11T13:54:50Z

I'm still having trouble understanding why one would ask for uninitialized memory and then be caught by a "bug" due to that. If you want a zeroed out array, call the zeros function. There is no good reason why 0 is a better "uninitialized" value than any other, and as mentioned above there are cases where 0 does not make much sense (e.g. for a rational 0//0).

Zero is often a pretty good choice for uninitialized memory, e.g. for integers, floats. Most of the time you end up getting zero anyway. Except when you don't. If zero is an ok value and non-zero is a bad value, now you have a rare, non-deterministic bug. Anyone who's had to track down a rare, non-deterministic bug can tell you that this is the worst kind of bug. (Except for rare, non-deterministic bugs that disappear when you look for them – those are even worse.)

In some sense, the "right" solution is to fill uninitialized memory with truly random bytes so that bugs in code that accidentally use uninitialized memory are more reliably forced to happen. However, then you still have a predictability issue since there could still be rare cases where random values are bad. So filling memory with random data doesn't solve the problem either. Moreover, filling memory with random data is really expensive. Filling memory with some fixed value, on the other hand, does solve the problem since if that value is problematic, it's reliably problematic. And if that value is zero, then doing this is essentially free.

Finally, other languages (e.g. C#) which have made the choice to fill uninitialized memory with zeros do not seem to have any regrets about this – there are no blog posts calling this a "billion dollar mistake" or complaining about it at all, for that matter.

StefanKarpinski · 2017-08-31T19:27:00Z

This is a non-breaking change, so we can move it off the 1.0 milestone.

ViralBShah · 2022-08-05T19:00:10Z

Do we still want to keep this issue open?

StefanKarpinski · 2022-09-28T18:45:00Z

Yes, I still think we should do this. I believe that @JeffBezanson wants to do it as well.

JeffreySarnoff · 2022-09-29T09:12:35Z

+1

xgdgsc · 2023-02-21T09:31:24Z

Any progress? Is there a "memset" to 0 on struct I could use to write less zeros in constructor ?

gbaraldi · 2023-02-21T18:51:55Z

There are a couple steps we can take here. I believe you can ask LLVM to zero things out for you, and we probably have to do something on the side of the GC. Maybe zero on page initialization/ask for zeroed pages, and then zero it again when freeing.

JeffBezanson added the needs decision A decision on this change is needed label Nov 25, 2014

StefanKarpinski changed the title ~~zero memory of uninitialized fields~~ zero out memory of uninitialized fields Nov 25, 2014

stevengj added a commit to stevengj/julia that referenced this issue Nov 28, 2014

replace malloc with calloc for array allocation for JuliaLang#9147

7c9ab1d

stevengj added a commit to stevengj/julia that referenced this issue Nov 28, 2014

replace malloc with calloc for array allocation for JuliaLang#9147

4bce620

vchuravy mentioned this issue Dec 24, 2014

[RFC] Map for Nullable #9446

Closed

sbromberger mentioned this issue Jan 12, 2015

RFC / Discussion: Security and Julia #9744

Closed

tkelman modified the milestones: 1.0, 0.6.0 Dec 29, 2016

JeffBezanson self-assigned this Jun 4, 2017

JeffBezanson mentioned this issue Jun 8, 2017

Ensure indexing into Array{Bool} returns Bool #22293

Closed

JeffBezanson mentioned this issue Jul 25, 2017

WIP: Use calloc for zeros #19852

Closed

6 tasks

JeffBezanson added a commit that referenced this issue Jul 27, 2017

zero-fill all Arrays on allocation. part of #9147

3218be2

JeffBezanson added a commit that referenced this issue Jul 28, 2017

zero-fill all Arrays on allocation. part of #9147

4a110a3

JeffBezanson added a commit that referenced this issue Jul 29, 2017

zero-fill all Arrays on allocation. part of #9147

9d16c36

JeffBezanson added a commit that referenced this issue Jul 31, 2017

zero-fill all Arrays on allocation. part of #9147

4cc0520

StefanKarpinski modified the milestones: 1.x, 1.0 Aug 31, 2017

nalimilan mentioned this issue Sep 6, 2017

Merge collect() into Vector()? #16029

Open

nalimilan mentioned this issue Sep 15, 2017

RFC: Get rid of #undef and replace it with null in Array{Union{Null, T}} #23721

Closed

vtjnash removed this from the 1.x milestone Feb 3, 2021

vtjnash mentioned this issue Jun 3, 2021

Semantics of Expr(:new) underspecified #26764

Closed

This was referenced Oct 15, 2021

Define and export undefs similar to zeros and ones #42620

Closed

jl_gc_calloc_aligned and friends: obtaining aligned zero initialized memory efficiently #42704

Open

Seelengrab mentioned this issue Jan 26, 2022

deleteat!: Handle unassigned array elements #43941

Merged

mbauman mentioned this issue Nov 8, 2023

"Poison" similar arrays #52080

Closed

zero out memory of uninitialized fields #9147

zero out memory of uninitialized fields #9147

Comments

StefanKarpinski commented Nov 25, 2014

eschnett commented Nov 25, 2014

StefanKarpinski commented Nov 25, 2014

ViralBShah commented Nov 25, 2014

JeffBezanson commented Nov 25, 2014

eschnett commented Nov 25, 2014

stevengj commented Nov 25, 2014

ViralBShah commented Nov 26, 2014

ViralBShah commented Nov 26, 2014

tknopp commented Nov 26, 2014

StefanKarpinski commented Nov 26, 2014

stevengj commented Nov 26, 2014

tknopp commented Nov 27, 2014

stevengj commented Nov 28, 2014

stevengj commented Nov 28, 2014

StefanKarpinski commented Dec 7, 2014

RauliRuohonen commented Dec 20, 2014

sbromberger commented Jan 6, 2015

ViralBShah commented Jan 6, 2015

sbromberger commented Jan 6, 2015

JeffBezanson commented Jan 6, 2015

sbromberger commented Jan 6, 2015

StefanKarpinski commented Jan 7, 2015

johnmyleswhite commented Jan 7, 2015

sbromberger commented Jan 7, 2015

sbromberger commented Jul 3, 2017

JeffBezanson commented Jul 3, 2017

jebej commented Jul 3, 2017

stevengj commented Jul 3, 2017

StefanKarpinski commented Jul 11, 2017

StefanKarpinski commented Aug 31, 2017

ViralBShah commented Aug 5, 2022

StefanKarpinski commented Sep 28, 2022

JeffreySarnoff commented Sep 29, 2022

xgdgsc commented Feb 21, 2023 • edited Loading

gbaraldi commented Feb 21, 2023

xgdgsc commented Feb 21, 2023 •

edited

Loading