randomize hash values per process #37166

StefanKarpinski · 2020-08-23T16:06:30Z

Other languages randomize their hash values per process to protect against DOS attacks on hash tables by intentionally causing hash table collisions. This also forces users not to rely on accidental dictionary ordering in their code. Even in languages where dictionaries are ordered, it's a good DOS prevention measure to randomize the hashing per process: the change becomes invisible (because of ordering), but since attackers can't predict hash collisions, they can't force them. Regardless of whether we go with ordered dicts or not, we may want to do this.

exaexa · 2023-01-28T20:11:06Z

+1 for this.

To be honest this is also tremendously useful for catching bugs where people depend on unreliable ordering of the stuff in Dict/Set/unique/... . Technically I'd even consider randomizing the order to be a good practice (there's no performance hit and problems get exposed sooner than later).

(Related thread in the slack: https://julialang.slack.com/archives/C6A044SQH/p1674933909186449 -- as a user you could currently try to replace hashindex in Base to do this functionality yourself; unfortunately that crashes all julia versions I tried (1.8 provided most colorful results).)

exaexa · 2023-01-28T20:21:27Z

(btw, I can probably write a patch; any hint on the randomness source would be very welcome though. PID+localtime could work?)

gbaraldi · 2023-01-28T22:13:55Z

It depends on how fancy you want to get here. If rand() is already defined at that point then it's a good option, otherwise ccall into jl_rand or if you want to get fancy. Which you might, uv_rand

rfourquet · 2023-01-30T08:36:45Z

There is also Libc.rand.

PallHaraldsson · 2024-12-05T21:55:29Z

We should likely do this since:
https://ocert.org/advisories/ocert-2011-003.html

The attacker, using specially crafted HTTP requests, can lead to a 100% of CPU usage which can last up to several hours depending on the targeted application and server performance, the amplification effect is considerable and requires little bandwidth and time on the attacker side.

https://docs.python.org/3/using/cmdline.html

Python randomizes and e.g. this turns it off:

PYTHONHASHSEED=1 python3

Python has -R to control this (to enable, but now on by default, so I'm not sure this is ever used, except to disable the env var). It seems like we might want a way to disable for debugging reasons.

I noticed Python randomizes hashes for strings too. So likely all hashes are just randomized. I'm not strictly sure that's needed, but is simplest implementation. Would non-randomzised hashes for e.g. strings and integers be any risk on its own, or only when used in a Dict?

nsajko added randomness Random number generation and the Random stdlib hashing security System security concerns and vulnerabilities labels Dec 5, 2024

adienes mentioned this issue Feb 2, 2025

switch MurmurHash3 to rapidhash ? #57235

Open

adienes mentioned this issue Feb 23, 2025

use rapidhash #57509

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

randomize hash values per process #37166

randomize hash values per process #37166

StefanKarpinski commented Aug 23, 2020 •

edited

Loading

exaexa commented Jan 28, 2023

exaexa commented Jan 28, 2023

gbaraldi commented Jan 28, 2023

rfourquet commented Jan 30, 2023

PallHaraldsson commented Dec 5, 2024 •

edited

Loading

randomize hash values per process #37166

randomize hash values per process #37166

Comments

StefanKarpinski commented Aug 23, 2020 • edited Loading

exaexa commented Jan 28, 2023

exaexa commented Jan 28, 2023

gbaraldi commented Jan 28, 2023

rfourquet commented Jan 30, 2023

PallHaraldsson commented Dec 5, 2024 • edited Loading

StefanKarpinski commented Aug 23, 2020 •

edited

Loading

PallHaraldsson commented Dec 5, 2024 •

edited

Loading