-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate snapshotting into core #1344
Conversation
This commit integrates snapshotting into `RelationalDB`. In this initial version, we take a snapshot every 1 million txes written to the commitlog. Part of this involves making the `committed_state` module `pub(crate)` where previously it was private. In order to preserve the previous visibility of its members, anything that was previously `pub` is now `pub(super)`. Notable TODOs, some or all of which may be left as future work: - Mark snapshots newer than the `durable_tx_offset` as invalid when restoring. - Write a test that exercises snapshot capture and restore. This is challenging because of the large `SNAPSHOT_INTERVAL`; unless/until we make that configurable, such a test would have to commit 1 million txes. - Make `SNAPSHOT_INTERVAL` configurable.
Durable DBs have a new dependency on Tokio in order to run the snapshot worker. This was causing tests to fail because the std test harness does not install a Tokio runtime. This commit fixes the test failures by installing a Tokio runtime around every test which constructs a durable DB.
I just ran a test with my use spacetimedb::{schedule, spacetimedb};
#[spacetimedb(table)]
pub struct Counter {
#[primarykey]
id: u64,
count: u64,
}
#[spacetimedb(init)]
pub fn init() {
Counter::insert(Counter { id: 0, count: 0 }).unwrap();
}
#[spacetimedb(reducer)]
pub fn count(repeat: bool) {
let mut current = Counter::filter_by_id(&0).unwrap();
current.count += 1;
log::info!("Counted to {}", current.count);
Counter::update_by_id(&0, current);
if repeat {
schedule!("0ms", count(true));
}
} The test was:
This worked and the host logged appropriately about restoring from a snapshot (which took ~500 us to capture and < 1 ms to read and restore, since it contained only 8 pages). Curiously, significantly more TXes happened than I was expecting. I made 1989 calls to the reducer. Including connect/disconnect calls, this amounts to ~6000 TXes. But the tx offset of my snapshot was 7000, and the I'm not worried about this, but I am confused and intrigued. |
This could be the extra disconnect tx introduced in #1288 |
Oh yeah, that'd do it. Thanks Kim! |
My previous attempt to fix the tests failed to notice that we already have a Tokio runtime in all the `TestDB::durable` tests. This commit uses that instead of spawning a second one.
crates/core/src/db/relational_db.rs
Outdated
@@ -60,6 +65,7 @@ pub struct RelationalDB { | |||
// TODO(cloutiertyler): This should not be public | |||
pub(crate) inner: Locking, | |||
durability: Option<Arc<dyn Durability<TxData = Txdata>>>, | |||
snapshots: Option<Arc<SnapshotWorker>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snapshots: Option<Arc<SnapshotWorker>>, | |
snapshotter: Option<Arc<SnapshotWorker>>, |
perhaps?
I wonder if this should be owned by the RelationalDB
or if it would just be better to have this be something held by the Host
or DatabaseInstanceContext
and to put maybe_do_snapshot
in there as well. Although I haven't through through exactly who should call maybe_do_snapshot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes the most sense to store in the RelationalDB
because:
- It acts a lot like the
Durability
, which is also owned by theRelationalDB
. - Opportunities to call
maybe_do_snapshot
become increasingly tenuous as you get more abstracted than this. - I would prefer not to leak the
CommittedState
, or even theLocking
, any further than this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed all the snapshots
vars and members to either snapshot_worker
or snapshot_repo
as appropriate.
crates/core/src/db/relational_db.rs
Outdated
Some((durability.clone(), disk_size_fn)), | ||
Some(snapshot_repo), | ||
)? | ||
.apply(durability); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will need to determine the ConnectedClients
entirely from st_clients
, as it appears right after this point.
Consider: the database crashed right after a snapshot, so the history suffix is empty -- but there might be connected clients in st_clients
. Before we had snapshotting, an empty history meant an empty database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the case now that I've "rebased" onto latest master, including st_clients
. Please verify that I'm doing this correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apply_history
returns the unmatched __connect__
s during replay -- but what I meant was, what if the snapshot's tx offset is the last offset in the commitlog (hence the history is empty)?
Arguably an edge case, so fine by me to merge and fix in a later patch (I can submit one if you want).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I assumed it would read the connected clients out of the table. I thought we had done away with groveling the logs for connected clients. I think a follow-up would be good, and if you're willing to do so, that would be great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. | ||
} = *committed_state; | ||
|
||
if let Err(e) = snapshot_repo.create_snapshot(tables.values_mut(), blob_store, tx_offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Just an observation.... Nothing to do here: Incidentally, the order of the tables will be deterministic due to IntMap<TableId, _>
, but we do not rely on this determinism and .reconstruct_tables(..)
will yield BTreeMap<TableId, _>
which would have resolved any non-determinism due to the stored order.)
I changed this for testing and accidentally committed at some point.
Rebased and performed manual testing. Appears to still work. |
Apparently gets installed in our `thread_spawn_handler` for Rayon. Possibly this happened recently, and when I wrote this change it didn't happen?
Nothing new to add wrt. latest changes :) |
@@ -64,6 +69,7 @@ pub struct RelationalDB { | |||
|
|||
inner: Locking, | |||
durability: Option<Arc<dyn Durability<TxData = Txdata>>>, | |||
snapshot_worker: Option<Arc<SnapshotWorker>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be best to move this outside of the RelationalDB
(DatabaseEngine
future name), to something slightly higher level (e.g. DatabaseInstanceContext
). That way we maintain that RelationalDB
remains runtime independent. I don't feel it strongly enough to block the review, but it makes me uncomfortable to have the DatabaseEngine
/RelationalDB
dependent on tokio.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I think the ability to snapshot itself would remain inside of RelationalDB
, I'm specifically just referring to the scheduling/triggering system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Durability
, at least as implemented by Local
, is also dependent on Tokio. As with the durability, we can avoid depending on Tokio by not supplying a SnapshotRepository
when constructing the RelationalDB
.
} | ||
} | ||
|
||
fn take_snapshot(committed_state: &RwLock<CommittedState>, snapshot_repo: &SnapshotRepository) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In reference to the scheduling comment above, I would imagine that this function would be added to the RelationalDB
type.
.into()); | ||
} | ||
let start = std::time::Instant::now(); | ||
let res = Locking::restore_from_snapshot(snapshot); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will eventually need to be a datastore trait function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now looks good to me. 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked table_size
, row_count
related logic. Looks good.
When opening a `RelationalDB`, determine the set of dangling clients (`ConnectedClients`) by scanning the `st_clients` table instead of tracking unmatched `__identity_connected__` calls during history replay. We left the replay tracking in place in #1288, treating the commitlog as the sole source of truth. With #1344 (snapshotting), this is no longer correct: the snapshot may contain rows in `st_clients`, but leave no history suffix for replay.
When opening a `RelationalDB`, determine the set of dangling clients (`ConnectedClients`) by scanning the `st_clients` table instead of tracking unmatched `__identity_connected__` calls during history replay. We left the replay tracking in place in #1288, treating the commitlog as the sole source of truth. With #1344 (snapshotting), this is no longer correct: the snapshot may contain rows in `st_clients`, but leave no history suffix for replay.
Description of Changes
This PR integrates snapshotting into
RelationalDB
.In this initial version, we take a snapshot every 1 million txes written to the commitlog.
Part of this involves making the
committed_state
modulepub(crate)
where previously it was private.In order to preserve the previous visibility of its members, anything that was previously
pub
is nowpub(super)
.Notable TODOs, some or all of which may be left as future work:
durable_tx_offset
as invalid when restoring.SNAPSHOT_INTERVAL
; unless/until we make that configurable, such a test would have to commit 1 million txes.SNAPSHOT_INTERVAL
configurable.API and ABI breaking changes
Nothing broken, but we expose a new API and on-disk format which we will have to maintain in the future.
Expected complexity level and risk
4 - bootstrapping and replaying a database is complex and error-prone, and this PR adds significantly more surface area to that codepath which can break.
Testing
Describe any testing you've done, and any testing you'd like your reviewers to do,
so that you're confident that all the changes work as expected!
SNAPSHOT_INTERVAL
to a small number, possibly 1000, then publish your favorite module, run it to capture several snapshots, then restart, observe the host logs about the snapshot being restored, and verify the state is correct.