Implement `OrdValBatch` without `retain_from` #419

frankmcsherry · 2023-11-19T21:45:50Z

This PR provides a re-implementation of OrdValBatch with a few properties:

it is more direct than the existing implementation, based off of trie layers,
it only writes advanced and consolidated updates, rather than rewriting them later,
it is amenable to container implementations for Vec<(Time, Diff)>,
it is amenable to compact Vec<(Time, Diff)> representations (RLE).

There might be other things to like about it. In the fullness of time it would mean we could remove the trace/layers module, because the direct implementations aren't much more complicated, and in some ways are much simpler because of their directness.

This PR passes tests, but it probably wants a fair bit of exercise to see if it exactly tracks the existing implementations.

cc: @antiguru

frankmcsherry · 2023-11-19T21:54:06Z

src/trace/implementations/ord_neu.rs

+            // Normally this would be `self.updates.len()`, but we have a clever compact encoding.
+            // Perhaps we should count such exceptions to the side, to provide a correct accounting.


No clever compact encoding yet, so ignore this comment.

frankmcsherry · 2023-11-20T06:38:20Z

The most recent commit reorganizes the two spine implementations, and re-exports ord.rs's spines (the old ones) as ValSpine and KeySpine from their shared module root. All direct uses of e.g. OrdValSpine are replaced with ValSpine. The notable exception is columnation.rs which is meant to exercise columnation and non-columnation spines, so I left it pointing at the specific ones.

frankmcsherry · 2023-11-20T14:01:11Z

Some light performance investigation suggests that this is not universally worse than the ord.rs spine. Running the bfs example with n=10^6, m=2x10^6, batches of size 1000, 1000 rounds completes in

ord    42.236413s
neu    39.531365375s

Running with n=10^8 and m=2x10^8, it takes both of them roughly the same time to load the data

ord    38.825862291s
neu    37.227272667s

Anecdotally through Activity Monitor, it appeared that ord went up to about 7GB where neu stayed around 3GB. That was very unscientific on my part, though.

frankmcsherry commented Nov 19, 2023

View reviewed changes

frankmcsherry added 2 commits November 20, 2023 00:23

Implement OrdValBatch without retain_from

fe2ddf6

Demonstrate update container

7628527

frankmcsherry force-pushed the new_ord_batch branch from 89ab53f to 7628527 Compare November 20, 2023 05:23

Organize opinions on default spines

46c36cf

Protect against cursor overflow

bc1ba20

frankmcsherry merged commit b82c3ee into TimelyDataflow:master Nov 20, 2023
1 check passed

frankmcsherry deleted the new_ord_batch branch November 20, 2023 14:14

This was referenced Oct 29, 2024

chore: release #532

Closed

chore: release #534

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `OrdValBatch` without `retain_from` #419

Implement `OrdValBatch` without `retain_from` #419

frankmcsherry commented Nov 19, 2023

frankmcsherry Nov 19, 2023

frankmcsherry commented Nov 20, 2023

frankmcsherry commented Nov 20, 2023

		// Normally this would be `self.updates.len()`, but we have a clever compact encoding.
		// Perhaps we should count such exceptions to the side, to provide a correct accounting.

Implement OrdValBatch without retain_from #419

Implement OrdValBatch without retain_from #419

Conversation

frankmcsherry commented Nov 19, 2023

frankmcsherry Nov 19, 2023

Choose a reason for hiding this comment

frankmcsherry commented Nov 20, 2023

frankmcsherry commented Nov 20, 2023

Implement `OrdValBatch` without `retain_from` #419

Implement `OrdValBatch` without `retain_from` #419