Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random: allow string seeds #51527

Merged
merged 5 commits into from
Oct 10, 2023
Merged

Random: allow string seeds #51527

merged 5 commits into from
Oct 10, 2023

Conversation

rfourquet
Copy link
Member

We used to be able to seed RNGs with a string, but that string was interpreted as the filename containing the actual seed. This was deprecated in #21359, in order to later allow using a string seed directly, which this patch does.

@rfourquet rfourquet added the randomness Random number generation and the Random stdlib label Sep 30, 2023
function hash_seed(str::AbstractString)
ctx = SHA.SHA2_256_CTX()
for chr in str
SHA.update!(ctx, reinterpret(NTuple{4, UInt8}, UInt32(chr)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UInt32(chr) may fail if the string is not valid UTF-8. This could use the codeunits(String(str)) instead of the Unicode codepoints to avoid this, like is done in SHA.jl
https://github.com/JuliaCrypto/SHA.jl/blob/e1af7dd0863dee14a83550faf4b6e08971993ce8/src/SHA.jl#L104-L105

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's a drawback of this approach. But I didn't want to use codeunits because it doesn't yield always the same result for different string types, e.g.

julia> using Strs

julia> codeunits(LatinStr("é"))
1-element Base.CodeUnits{UInt8, LatinStr}:
 0xe9

julia> codeunits("é")
2-element Base.CodeUnits{UInt8, String}:
 0xc3
 0xa9

I thought rejecting invalid strings was not a big deal, and maybe useful even. But I could as well just replace UInt32(chr) by chr above, i.e. reinterpret chr directly as NTuple{4, UInt8}.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, with codeunits you can convert the AbstractString to a String to ensure it is UTF-8 encoded. With reinterpreting chr directly, you might need to convert to Char because a string might have a different subtype of AbstractChar as its characters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah I missed your conversion to String! Yeah that's definitely a possibility, I initially thought this could be unnecessary overhead, but it ends up being actually more efficient by directly calling SHA.update! on codeunits(String(str)), thank you :)

stdlib/Random/src/RNGs.jl Outdated Show resolved Hide resolved
rfourquet and others added 4 commits October 7, 2023 22:07
We used to be able to seed RNGs with a string, but that string was interpreted
as the filename containing the actual seed. This was deprecated in #21359, in
order to later allow using a string seed directly, which this patch does.
Co-authored-by: Nathan Zimmerberg <[email protected]>
Co-authored-by: Nathan Zimmerberg <[email protected]>
@PallHaraldsson
Copy link
Contributor

Is (all of) this PR actually needed? We seem to have done fine without if before. Random is its own module, so not a huge issue to bloat it, since this doesn't affect Base. But Random is used for threads so I might be wrong about that.

Are we adding this, since it's important and/or feature parity with some other e.g. Python or NumPy? Then I kind of want that. This functionality could hypothetically have gone into an external package, then just annoying to use...

@rfourquet
Copy link
Member Author

Is (all of) this PR actually needed?

Not at all, just convenience.

feature parity with some other e.g. Python

Yes python supports that, and Rust supports that with a crate (mentioned in the "rust rand book", which I just found about).

This functionality could hypothetically have gone into an external package

This would have to commit type piracy, so this this wouldn't fly...
But this is only a few lines of code, so I'm not too worried about bloat here :)

@rfourquet rfourquet merged commit f82f0d4 into master Oct 10, 2023
1 check passed
@rfourquet rfourquet deleted the rf/seed-string branch October 10, 2023 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants