Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement OrdValBatch without retain_from #419

Merged
merged 4 commits into from
Nov 20, 2023

Conversation

frankmcsherry
Copy link
Member

This PR provides a re-implementation of OrdValBatch with a few properties:

  1. it is more direct than the existing implementation, based off of trie layers,
  2. it only writes advanced and consolidated updates, rather than rewriting them later,
  3. it is amenable to container implementations for Vec<(Time, Diff)>,
  4. it is amenable to compact Vec<(Time, Diff)> representations (RLE).

There might be other things to like about it. In the fullness of time it would mean we could remove the trace/layers module, because the direct implementations aren't much more complicated, and in some ways are much simpler because of their directness.

This PR passes tests, but it probably wants a fair bit of exercise to see if it exactly tracks the existing implementations.

cc: @antiguru

Comment on lines +184 to +192
// Normally this would be `self.updates.len()`, but we have a clever compact encoding.
// Perhaps we should count such exceptions to the side, to provide a correct accounting.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No clever compact encoding yet, so ignore this comment.

@frankmcsherry
Copy link
Member Author

The most recent commit reorganizes the two spine implementations, and re-exports ord.rs's spines (the old ones) as ValSpine and KeySpine from their shared module root. All direct uses of e.g. OrdValSpine are replaced with ValSpine. The notable exception is columnation.rs which is meant to exercise columnation and non-columnation spines, so I left it pointing at the specific ones.

@frankmcsherry
Copy link
Member Author

Some light performance investigation suggests that this is not universally worse than the ord.rs spine. Running the bfs example with n=10^6, m=2x10^6, batches of size 1000, 1000 rounds completes in

ord    42.236413s
neu    39.531365375s

Running with n=10^8 and m=2x10^8, it takes both of them roughly the same time to load the data

ord    38.825862291s
neu    37.227272667s

Anecdotally through Activity Monitor, it appeared that ord went up to about 7GB where neu stayed around 3GB. That was very unscientific on my part, though.

@frankmcsherry frankmcsherry merged commit b82c3ee into TimelyDataflow:master Nov 20, 2023
1 check passed
@frankmcsherry frankmcsherry deleted the new_ord_batch branch November 20, 2023 14:14
This was referenced Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant