"Poison" similar arrays #52080

PallHaraldsson · 2023-11-08T15:33:25Z

I would suggest initializing the first bytes of an array to NaN, NaN32 (just in case some codes uses that or reinterprets), then NaN16, for a total of 112 bits, 14 bytes, less than a cache-line on all CPUs. Can be stored with one store instruction, on at least ARM(64), that allows storing a pair of words.

For these concatenated bitstrings:

julia> bitstring(NaN)
"0111111111111000000000000000000000000000000000000000000000000000"

julia> bitstring(NaN32)
"01111111110000000000000000000000"

julia> bitstring(NaN16)
"0111111000000000"

Would guarantee (some) NaNs always, not just sometimes,
https://discourse.julialang.org/t/occasionally-nans-when-using-similar/48224/30?u=palli

We want to discourage use of similar, or incorrect use of it, somehow, and getting a NaN signals you need to think more. If you allocate an array/the new memory type, alloc in general what similar is based on then you need to store some metadata, not actually touch the allocated region, which is the point of similar, for performance.

So storing in even a prefix of it (or all of it) might be though of as a performance issue, but most likely you're going to access it later,maybe soon, and you will recover the performance of the store. It could even be faster, since a load is always slow if not in cache, but if you store to it first, then not. So possibly this should be done for the end of the array too. Then reserving one more cache-line.

It would help if the array is read from the end. You don't know from which, if not both (as with QuickSort).

The text was updated successfully, but these errors were encountered:

adienes · 2023-11-08T16:53:38Z

want to discourage use of similar,

wait, what? why?

PallHaraldsson · 2023-11-08T18:13:41Z

That was an incomplete quote missing "incorrect use of it".

What's proposed here would make such more obvious, and not slow, actually correct (or incorrect) use of similar potentially faster, in the end.

mbauman · 2023-11-08T19:13:37Z

See #9147

mikmoore · 2023-11-08T19:43:02Z

I'll remark that it's not necessary to include every size of NaN to get a NaN in an IEEEFloat array. An all-ones bit pattern is always a NaN in any length, so we just need one as long as the longest IEEEFloat.

julia> reinterpret(Float64,Int64[-1])
1-element reinterpret(Float64, ::Vector{Int64}):
 NaN

julia> reinterpret(Float32,Int64[-1])
2-element reinterpret(Float32, ::Vector{Int64}):
 NaN
 NaN

julia> reinterpret(Float16,Int64[-1])
4-element reinterpret(Float16, ::Vector{Int64}):
 NaN
 NaN
 NaN
 NaN

But this bit pattern would provide a less-ridiculous -1 for any BitSigned array, so the value would want to be type-specific.

That said, a person can misuse any function in an infinity of different ways so I don't find this very compelling. Also, while writing the first few bytes is $O(1)$, it also requires writing the memory which means bringing it into cache. While it may not be devastating to performance in most cases (there's an okay chance it's about to be accessed anyway), it doesn't seem worth doing recreationally.

PallHaraldsson · 2023-11-08T19:59:12Z

writing the first few bytes is O(1), it also requires writing the memory which means bringing it into cache.

"bring into" isn't a wording I would use. Reading from memory is costly, may 300 cycles, "brings in", writing, is almost free, it would go to L1 cache and stay there until evicted, but most likely overwritten anyway (in correct code ALWAYS).

In some CPUs reading floats bypasses L1 cache going to L2, to preserve the more valuable and smaller L1 cache. I'm not sure writes might too, at least in theory. And nobody is waiting for writes, so it's latency is fully hidden. Why I think this is effectively free, but a good warning to users. Better than "occasionally NaN surprise", we want it always, it's a good surprise.

mkitti · 2023-11-08T22:36:06Z

similar should default to zero initialization but be able to take undef as an argument or keyword argument to get the current behavior.

It would be nice if we had more array initializers than undef , missing, and nothing. Why not zero?

In ArrayInitializers.jl, I made a zeroinit type for example:

julia> using ArrayInitializers

julia> Array{Float64}(zeroinit, 5)
5-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0

Also don't forget about calloc

mbauman · 2023-11-08T22:47:50Z

I mean, uninitialized is the third word in its docstring.

PallHaraldsson changed the title ~~Posion similar arrays~~ "Poison" similar arrays Nov 8, 2023

KristofferC closed this as completed Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Poison" similar arrays #52080

"Poison" similar arrays #52080

PallHaraldsson commented Nov 8, 2023

adienes commented Nov 8, 2023

PallHaraldsson commented Nov 8, 2023 •

edited

Loading

mbauman commented Nov 8, 2023

mikmoore commented Nov 8, 2023 •

edited

Loading

PallHaraldsson commented Nov 8, 2023

mkitti commented Nov 8, 2023

mbauman commented Nov 8, 2023

"Poison" similar arrays #52080

"Poison" similar arrays #52080

Comments

PallHaraldsson commented Nov 8, 2023

adienes commented Nov 8, 2023

PallHaraldsson commented Nov 8, 2023 • edited Loading

mbauman commented Nov 8, 2023

mikmoore commented Nov 8, 2023 • edited Loading

PallHaraldsson commented Nov 8, 2023

mkitti commented Nov 8, 2023

mbauman commented Nov 8, 2023

PallHaraldsson commented Nov 8, 2023 •

edited

Loading

mikmoore commented Nov 8, 2023 •

edited

Loading