Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Poison" similar arrays #52080

Closed
PallHaraldsson opened this issue Nov 8, 2023 · 7 comments
Closed

"Poison" similar arrays #52080

PallHaraldsson opened this issue Nov 8, 2023 · 7 comments

Comments

@PallHaraldsson
Copy link
Contributor

I would suggest initializing the first bytes of an array to NaN, NaN32 (just in case some codes uses that or reinterprets), then NaN16, for a total of 112 bits, 14 bytes, less than a cache-line on all CPUs. Can be stored with one store instruction, on at least ARM(64), that allows storing a pair of words.

For these concatenated bitstrings:

julia> bitstring(NaN)
"0111111111111000000000000000000000000000000000000000000000000000"

julia> bitstring(NaN32)
"01111111110000000000000000000000"

julia> bitstring(NaN16)
"0111111000000000"

Would guarantee (some) NaNs always, not just sometimes,
https://discourse.julialang.org/t/occasionally-nans-when-using-similar/48224/30?u=palli

We want to discourage use of similar, or incorrect use of it, somehow, and getting a NaN signals you need to think more. If you allocate an array/the new memory type, alloc in general what similar is based on then you need to store some metadata, not actually touch the allocated region, which is the point of similar, for performance.

So storing in even a prefix of it (or all of it) might be though of as a performance issue, but most likely you're going to access it later,maybe soon, and you will recover the performance of the store. It could even be faster, since a load is always slow if not in cache, but if you store to it first, then not. So possibly this should be done for the end of the array too. Then reserving one more cache-line.

It would help if the array is read from the end. You don't know from which, if not both (as with QuickSort).

@PallHaraldsson PallHaraldsson changed the title Posion similar arrays "Poison" similar arrays Nov 8, 2023
@adienes
Copy link
Contributor

adienes commented Nov 8, 2023

want to discourage use of similar,

wait, what? why?

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Nov 8, 2023

That was an incomplete quote missing "incorrect use of it".

What's proposed here would make such more obvious, and not slow, actually correct (or incorrect) use of similar potentially faster, in the end.

@mbauman
Copy link
Member

mbauman commented Nov 8, 2023

See #9147

@mikmoore
Copy link
Contributor

mikmoore commented Nov 8, 2023

I'll remark that it's not necessary to include every size of NaN to get a NaN in an IEEEFloat array. An all-ones bit pattern is always a NaN in any length, so we just need one as long as the longest IEEEFloat.

julia> reinterpret(Float64,Int64[-1])
1-element reinterpret(Float64, ::Vector{Int64}):
 NaN

julia> reinterpret(Float32,Int64[-1])
2-element reinterpret(Float32, ::Vector{Int64}):
 NaN
 NaN

julia> reinterpret(Float16,Int64[-1])
4-element reinterpret(Float16, ::Vector{Int64}):
 NaN
 NaN
 NaN
 NaN

But this bit pattern would provide a less-ridiculous -1 for any BitSigned array, so the value would want to be type-specific.

That said, a person can misuse any function in an infinity of different ways so I don't find this very compelling. Also, while writing the first few bytes is $O(1)$, it also requires writing the memory which means bringing it into cache. While it may not be devastating to performance in most cases (there's an okay chance it's about to be accessed anyway), it doesn't seem worth doing recreationally.

@PallHaraldsson
Copy link
Contributor Author

writing the first few bytes is O(1), it also requires writing the memory which means bringing it into cache.

"bring into" isn't a wording I would use. Reading from memory is costly, may 300 cycles, "brings in", writing, is almost free, it would go to L1 cache and stay there until evicted, but most likely overwritten anyway (in correct code ALWAYS).

In some CPUs reading floats bypasses L1 cache going to L2, to preserve the more valuable and smaller L1 cache. I'm not sure writes might too, at least in theory. And nobody is waiting for writes, so it's latency is fully hidden. Why I think this is effectively free, but a good warning to users. Better than "occasionally NaN surprise", we want it always, it's a good surprise.

@mkitti
Copy link
Contributor

mkitti commented Nov 8, 2023

similar should default to zero initialization but be able to take undef as an argument or keyword argument to get the current behavior.

It would be nice if we had more array initializers than undef , missing, and nothing. Why not zero?

In ArrayInitializers.jl, I made a zeroinit type for example:

julia> using ArrayInitializers

julia> Array{Float64}(zeroinit, 5)
5-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0

Also don't forget about calloc

@mbauman
Copy link
Member

mbauman commented Nov 8, 2023

I mean, uninitialized is the third word in its docstring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants