Skip to content
This repository has been archived by the owner on Jun 1, 2023. It is now read-only.

small hash #102

Open
rurban opened this issue Dec 30, 2015 · 5 comments
Open

small hash #102

rurban opened this issue Dec 30, 2015 · 5 comments
Assignees
Milestone

Comments

@rurban
Copy link
Member

rurban commented Dec 30, 2015

optimize hashes with <= 3-5 keys to a simple array of keys and values with linear lookup.

HvSMALL(hv) / XHvSMALL(xhv) is either checking HvMAX < 7, or a flag. If a flag the very first HE* entry needs to be a non-ptr tag (& 0x1).
We'd need a flag with inlined HEs and overlong keys, to omit HvSMALL optims with such long keys.
We cannot the hv_aux based HvFLAGS with normal HvSMALL hashes, esp. when inlined.

The best would be a he-array alike inlined len/char*/flags/val array to be cache concious. (as in #24 feature/gh24-he-array). The len really should be run-length encoded, then the flags needed for hash cmp need to come first.
However at first we start with simple HE* arrays. (array of ptrs, not values)
The last array element needs to have an NULL sentinel, so we cannot use all 7 HE*, only 6.

But there are many more simple hash optims, which we do first.

  • extract uncommon magical code from hv_common
  • add __builtin_ctz support (count trailing zeros) and use it instead of division on DO_HSPLIT (done with the builtin-ctz branch)
  • pre-extend hashes as in aassign with av_extend, when the number of keys is on the stack. This speeds up all the big hash inits (e.g. warnings.pm), needing no costly series of splits during initialization.
@rurban rurban self-assigned this Dec 30, 2015
@wollmers
Copy link
Contributor

I estimate you can use linear or serial (unsorted) lookup up to 100 keys or even more, depending on benchmarks.

In my port of LCS::BV from Perl to C I began with Bob Jenkins hash and ended the tuning using VLAs (variable length arrays) on the stack, the array serially filled (\0 terminated). See llcs_seq_a() and the used hash_setpos() and hash_getpos(). With Bob Jenkins I get 250 kHz (cases per second) on i5@1500, with serial VLAs 7.5 MHz, thus factor 30x. The calloc variant llcs_seq() comes at 4 MHz.

Of course in my example I can benefit from the known restrictions: maximum size, keys strings immutable, typed values (uint_64).

@rurban
Copy link
Member Author

rurban commented Apr 27, 2016

So many? I thought I only want to fill one cache line, so just very few
keys. But I'll benchmark it soon, when I got more time. Other langs tested
3-5, if I remember.

On Wed, Apr 27, 2016, 20:59 Helmut Wollmersdorfer [email protected]
wrote:

I estimate you can use linear or serial (unsorted) lookup up to 100 keys
or even more, depending on benchmarks.

In my port of LCS::BV
https://github.com/wollmers/LCS-BV/tree/master/lib/LCS from Perl to C I
began with Bob Jenkins hash and ended the tuning using VLAs (variable
length arrays) on the stack, the array serially filled (\0 terminated). See
llcs_seq_a()
https://github.com/wollmers/c-lcs-bv/blob/master/lcstest.c#L105 and the
used hash_setpos() and hash_getpos(). With Bob Jenkins I get 250 kHz
(cases per second) on i5@1500, with serial VLAs 7.5 MHz, thus factor 30x.
The calloc variant llcs_seq()
https://github.com/wollmers/c-lcs-bv/blob/master/lcstest.c#L144 comes
at 4 MHz.

Of course in my example I can benefit from the known restrictions: maximum
size, keys strings immutable, typed values (uint_64).


You are receiving this because you were assigned.
Reply to this email directly or view it on GitHub
#102 (comment)

@wollmers
Copy link
Contributor

You should trust only numbers you benchmarked yourself;-)

Hash is said to have complexity O(1). But as always it is O(1*k), where k is the implementation factor.

Serial has O((n/2)*k). A break even point of n=4 between hash and serial would need k_hash = 2 * k_serial. I.e. the hash algorithm executes only the double amount of instructions compared to one iteration of the loop of serial. My serial has 3 instructions (C operators) in the loop including conditions. So for a break even n=4 it would need a hash function (locating the entry in the array) to only use 6 instructions.

I didn't optimize for cache friendlyness directly. Serial just maps a nearly indefinite (sparse) alphabet to a minimal one (none sparse) and keeps nearly the order of filling, which is memory and cache friendly. Hash algorithms (if not perfect hashes) map sparse to not so sparse, but still sparse.

@rurban
Copy link
Member Author

rurban commented Jul 17, 2016

I went with 7 because this is the initial calloced size. But it doesn't work yet, so I cannot benchmark it.

rurban pushed a commit that referenced this issue Nov 25, 2018
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Nov 25, 2018
avoid hash calculation for a short number of keys.
calloc the first 7 words of HvARRAY. if we add one to the 6th entry
we need to split it, as the 7th, the last, is needed as NULL sentinel.

on split a small hash, we need to allocate a fresh array to move
the hashed entries to. This can be optimized furtheron. (alloc 2x)

on insert a new entry at 7th, we can avoid a split when placeholders exist.
just replace it then.

See #102

WIP: the standard operations work, but use constant fails.
Currently 13% slower.
rurban pushed a commit that referenced this issue Mar 18, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Apr 1, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Apr 5, 2019
avoid hash calculation for a short number of keys.
calloc the first 7 words of HvARRAY. if we add one to the 6th entry
we need to split it, as the 7th, the last, is needed as NULL sentinel.

on split a small hash, we need to allocate a fresh array to move
the hashed entries to. This can be optimized furtheron. (alloc 2x)

on insert a new entry at 7th, we can avoid a split when placeholders exist.
just replace it then.

See #102

WIP: the standard operations work, but use constant fails.
Currently 13% slower.
rurban pushed a commit that referenced this issue Apr 5, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Apr 5, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Apr 30, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Jun 12, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Jun 24, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Jun 26, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Jun 27, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Jun 28, 2019
avoid hash calculation for a short number of keys.
calloc the first 7 words of HvARRAY. if we add one to the 6th entry
we need to split it, as the 7th, the last, is needed as NULL sentinel.

on split a small hash, we need to allocate a fresh array to move
the hashed entries to. This can be optimized furtheron. (alloc 2x)

on insert a new entry at 7th, we can avoid a split when placeholders exist.
just replace it then.

See #102

WIP: the standard operations work, but use constant fails.
Currently 13% slower.
rurban pushed a commit that referenced this issue Jul 1, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Jul 2, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Jul 2, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Jul 3, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Aug 25, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Dec 17, 2019
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Jan 19, 2020
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
rurban pushed a commit that referenced this issue Jan 19, 2020
Calculate hashes on demand, but not store it in a HEK
to make HEK shorter to fill more entries into a cache line.
HEK_HASH(hek) is now invalid and gone.
Use the new HeHASH_calc(he), HEK_HASH_calc(hek), SvSHARED_HASH_calc(sv)
instead.
See http://www.ilikebigbits.com/blog/2016/8/28/designing-a-fast-hash-table
for benchmarks (HashCache).

And using 4 tests in the hot hash loop also makes not much sense,
when checking the length and the string is enough to weed out
collisions.
This strategy, recomputing the hash wehen needed, is so far 1-7% slower,
but we hope to get to speed with the HeARRAY patch. See below.

The endgoal is to get rid of linked lists and store the collisions
inlined in consecutive memory, in a HekARRAY. (len,cmp-flags,char*,other-flags,val)
Measurements in "Cache-Conscious Collision Resolution in String Hash Tables"
by Nikolas Askitis and Justin Zobel, Melbourne 2005 show that this is the
fastest strategy for Open Hashing (chained) tables.
See GH #24 and GH #102

The next idea is to use MSB varint encoding of the str length in a HEK,
because our strings are usually short, len < 63, fits into one byte.
We can then merge it with the cmp-flags, the flags only needed for comparison.
See https://techoverflow.net/blog/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/
or just <63 one byte, >63 MSB: I32 len.
Note that the 1st MSB bit is already taken for UTF8.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants