-
Couldn't load subscription status.
- Fork 8
Execution traces
It is hard to generate realistic calls to a hash function for microbenchmarking. We recorded traces a Backtrace server's initial hash table population phase with a synthetic data set that is representative of (some of) our production workload. We also recorded UMASH calls during steady state, again with synthetic, but representative data and requests.
This is the raw output from perf script with UMASH tracepoints
enabled. Each line has the same general shape:
coronerd/3 18410 [002] 6998482.097358: sdt_libumash:umash_full: (5630566f8919) arg1=94765633077920 arg2=1 arg3=139594106184935 arg4=64
coronerd/8 18415 [012] 6998481.440851: sdt_libumash:umash_fp_init: (5630566f90b4) arg1=139667898878592 arg2=94765624153984
coronerd/2 18409 [001] 6998481.440856: sdt_libumash:umash_sink_update: (5630566f8621) arg1=139668257941328 arg2=16 arg3=139668257941216 arg4=16
coronerd/3 18410 [002] 6998481.440947: sdt_libumash:umash_fp_digest: (5630566f91dc) arg1=139668146702976
Everything left of the first colon is metadata: thread name, thread id, core id, and timestamp. I expect only the tid will be useful, to reconstruct call sequences from this parallel trace.
The tracepoint name after the first colon matches the first two
arguments to DTRACE_PROBEn in umash.c. After that comes the
address of the trace instruction (not useful for benchmarking), and
finally the tracepoint arguments provided to DTRACE_PROBEn.
Each call has a different list of arguments (see umash.c for
confirmation):
-
sdt_libumash:umash_full: address of theumash_params,whichUMASH to compute (0 or 1), address of the bytes to hash, number of bytes to hash. -
sdt_libumash:umash_fp_init: address of the initialisedumash_fp_state, address of theumash_params. -
sdt_libumash:umash_sink_update: address of the sinkumash_stateorumash_fp_state, number of bytes remaining in the internal buffer, address of the bytes to hash, number of bytes to hash. -
sdt_libumash:umash_fp_digest: address of theumash_fp_stateto digest. -
sdt_libumash:umash_fprint: address of theumash_params, address of the bytes to hash, number of bytes to hash (does not appear in the trace files). -
sdt_libumash:umash_init: address of the initialisedumash_state, address of theumash_params,whichUMASH to compute (0 or 1) (does not appear in the trace files). -
sdt_libumash:umash_digest: address of the digestedumash_state(does not appear in the trace files).
When traces include incremental fingerprinting calls, we can expect
each thread to only work on one umash_fp_state at a time: the trace
for each thread will always look like the regular expression
(([umash_fp_init] [umash_sink_update]* [umash_fp_digest]) | [umash_full])*,
although recording issues may have introduced irregularities.
Our synthetic hash table population phase makes more than 150 million calls to UMASH. It's not clear that there is any large scale pattern here (at the small end, we do observe rehashing calls), and that's a lot of data, so we only have truncated subsequences of the trace:
-
startup-1M.2020-08-28.trace.bz2 has the first million calls to UMASH functions during the batch hash table phase.
-
startup-tail-1M.2020-08-28.trace.bz2 has the last million calls to UMASH functions during the batch phase.
During this phase, we only call umash_full, arguably with an
off-by-one, since we include NUL terminators in the hashed data. It
seems likely that we will fix that off-by-one, and always subtract one
from the input length, rather than focusing on "odd" input lenghts.
Steady state sees a more diverse set of calls to UMASH: we interleave phases of heavy (string) hashing with other phases that use the incremental fingerprinting interface with different call patterns.
steady-state.2020-08-28.trace.bz2 records all of these calls during a representative workload (2.5 million calls into UMASH over 26 seconds).
This phase exercises umash_full (often with C-string arguments that
must first be passed to strlen), and incremental fingerprinting:
umash_fp_init, umash_sink_update, and umash_fp_digest.