-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random: allow string seeds #51527
Random: allow string seeds #51527
Conversation
a0394f0
to
046b803
Compare
stdlib/Random/src/RNGs.jl
Outdated
function hash_seed(str::AbstractString) | ||
ctx = SHA.SHA2_256_CTX() | ||
for chr in str | ||
SHA.update!(ctx, reinterpret(NTuple{4, UInt8}, UInt32(chr))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UInt32(chr)
may fail if the string is not valid UTF-8. This could use the codeunits(String(str))
instead of the Unicode codepoints to avoid this, like is done in SHA.jl
https://github.com/JuliaCrypto/SHA.jl/blob/e1af7dd0863dee14a83550faf4b6e08971993ce8/src/SHA.jl#L104-L105
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's a drawback of this approach. But I didn't want to use codeunits
because it doesn't yield always the same result for different string types, e.g.
julia> using Strs
julia> codeunits(LatinStr("é"))
1-element Base.CodeUnits{UInt8, LatinStr}:
0xe9
julia> codeunits("é")
2-element Base.CodeUnits{UInt8, String}:
0xc3
0xa9
I thought rejecting invalid strings was not a big deal, and maybe useful even. But I could as well just replace UInt32(chr)
by chr
above, i.e. reinterpret chr
directly as NTuple{4, UInt8}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, with codeunits
you can convert the AbstractString
to a String
to ensure it is UTF-8 encoded. With reinterpreting chr
directly, you might need to convert to Char
because a string might have a different subtype of AbstractChar
as its characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah I missed your conversion to String
! Yeah that's definitely a possibility, I initially thought this could be unnecessary overhead, but it ends up being actually more efficient by directly calling SHA.update!
on codeunits(String(str))
, thank you :)
We used to be able to seed RNGs with a string, but that string was interpreted as the filename containing the actual seed. This was deprecated in #21359, in order to later allow using a string seed directly, which this patch does.
Co-authored-by: Nathan Zimmerberg <[email protected]>
a27d686
to
187f7d1
Compare
Co-authored-by: Nathan Zimmerberg <[email protected]>
Is (all of) this PR actually needed? We seem to have done fine without if before. Random is its own module, so not a huge issue to bloat it, since this doesn't affect Base. But Random is used for threads so I might be wrong about that. Are we adding this, since it's important and/or feature parity with some other e.g. Python or NumPy? Then I kind of want that. This functionality could hypothetically have gone into an external package, then just annoying to use... |
Not at all, just convenience.
Yes python supports that, and Rust supports that with a crate (mentioned in the "rust rand book", which I just found about).
This would have to commit type piracy, so this this wouldn't fly... |
We used to be able to seed RNGs with a string, but that string was interpreted as the filename containing the actual seed. This was deprecated in #21359, in order to later allow using a string seed directly, which this patch does.