Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

randomize hash values per process #37166

Open
StefanKarpinski opened this issue Aug 23, 2020 · 5 comments
Open

randomize hash values per process #37166

StefanKarpinski opened this issue Aug 23, 2020 · 5 comments
Labels
hashing randomness Random number generation and the Random stdlib security System security concerns and vulnerabilities

Comments

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 23, 2020

Other languages randomize their hash values per process to protect against DOS attacks on hash tables by intentionally causing hash table collisions. This also forces users not to rely on accidental dictionary ordering in their code. Even in languages where dictionaries are ordered, it's a good DOS prevention measure to randomize the hashing per process: the change becomes invisible (because of ordering), but since attackers can't predict hash collisions, they can't force them. Regardless of whether we go with ordered dicts or not, we may want to do this.

@exaexa
Copy link
Contributor

exaexa commented Jan 28, 2023

+1 for this.

To be honest this is also tremendously useful for catching bugs where people depend on unreliable ordering of the stuff in Dict/Set/unique/... . Technically I'd even consider randomizing the order to be a good practice (there's no performance hit and problems get exposed sooner than later).

(Related thread in the slack: https://julialang.slack.com/archives/C6A044SQH/p1674933909186449 -- as a user you could currently try to replace hashindex in Base to do this functionality yourself; unfortunately that crashes all julia versions I tried (1.8 provided most colorful results).)

@exaexa
Copy link
Contributor

exaexa commented Jan 28, 2023

(btw, I can probably write a patch; any hint on the randomness source would be very welcome though. PID+localtime could work?)

@gbaraldi
Copy link
Member

It depends on how fancy you want to get here. If rand() is already defined at that point then it's a good option, otherwise ccall into jl_rand or if you want to get fancy. Which you might, uv_rand

@rfourquet
Copy link
Member

There is also Libc.rand.

@nsajko nsajko added randomness Random number generation and the Random stdlib hashing security System security concerns and vulnerabilities labels Dec 5, 2024
@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Dec 5, 2024

We should likely do this since:
https://ocert.org/advisories/ocert-2011-003.html

The attacker, using specially crafted HTTP requests, can lead to a 100% of CPU usage which can last up to several hours depending on the targeted application and server performance, the amplification effect is considerable and requires little bandwidth and time on the attacker side.

https://docs.python.org/3/using/cmdline.html

Python randomizes and e.g. this turns it off:

PYTHONHASHSEED=1 python3

Python has -R to control this (to enable, but now on by default, so I'm not sure this is ever used, except to disable the env var). It seems like we might want a way to disable for debugging reasons.

I noticed Python randomizes hashes for strings too. So likely all hashes are just randomized. I'm not strictly sure that's needed, but is simplest implementation. Would non-randomzised hashes for e.g. strings and integers be any risk on its own, or only when used in a Dict?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hashing randomness Random number generation and the Random stdlib security System security concerns and vulnerabilities
Projects
None yet
Development

No branches or pull requests

6 participants