This repository has been archived by the owner on Jun 1, 2023. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
HeArray: remove hek_hash and refcounted_he_hash
Calculate hashes on demand, but not store it in a HEK to make HEK shorter to fill more entries into a cache line. HEK_HASH(hek) is now invalid and gone. Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv) instead. See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table for benchmarks (HashCache). And using 4 tests in the hot hash loop also makes not much sense, when checking the length and the string is enough to weed out collisions. This strategy, recomputing the hash wehen needed, is so far 1-7% slower, but we hope to get to speed with the HeARRAY patch. See below. The endgoal is to get rid of linked lists and store the collisions inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val) Measurements in "Cache-Conscious Collision Resolution in String Hash Tables" by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the fastest strategy for Open Hashing (chained) tables. See GH #24 and GH #102 The next idea is to use MSB varint encoding of the str length in a HEK, because our strings are usually short, len < 63, fits into one byte. We can then merge it with the cmp-flags, the flags only needed for comparison. See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/ or just <63 one byte, >63 MSB: I32 len. Note that the 1st MSB bit is already taken for UTF8.
- Loading branch information