-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy subgraph data in batches when grafting #2293
Conversation
4507eec
to
3060635
Compare
3060635
to
ae4fb48
Compare
ae4fb48
to
6bb91b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This foreign schema thing is some dark sorcery!
store/postgres/migrations/2021-03-12-070453_add_copy_state/up.sql
Outdated
Show resolved
Hide resolved
store/postgres/src/catalog.rs
Outdated
.filter(nsp::nspname.eq(namespace.as_str())) | ||
.count() | ||
.get_result::<i64>(conn)? | ||
> 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do diesel::dsl::exists(nsp::table.filter(...))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! Changed.
store/postgres/src/catalog.rs
Outdated
.filter(ft::foreign_table_schema.eq(src.namespace.as_str())) | ||
.count() | ||
.get_result::<i64>(conn)? | ||
> 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed too
store/postgres/src/copy.rs
Outdated
.filter(cs::src.eq(self.src.site.id)) | ||
.filter(cs::finished_at.is_null()) | ||
.count() | ||
.get_result::<i64>(conn)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again exists
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
out.push_sql(" where vid >= "); | ||
out.push_bind_param::<BigInt, _>(&self.first_vid)?; | ||
out.push_sql(" and vid < "); | ||
out.push_bind_param::<BigInt, _>(&self.last_vid)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last_vid
isn't really the last vid but the one more than the last, I think this would be more readable if this where <= last_vid
and last_vid
was passed in as the actual last vid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True; I changed that.
store/postgres/src/copy.rs
Outdated
.load::<MaxVid>(conn)? | ||
.first() | ||
.map(|v| v.max_vid) | ||
.unwrap_or(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the table is empty, I believe the target_vid
should be -1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True; that saves an unnecessary attempt to copy vid = 0
store/postgres/src/copy.rs
Outdated
} | ||
|
||
fn finished(&self) -> bool { | ||
self.next_vid >= self.target_vid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that next_vid
has not yet been copied, and we want to copy up to and including target_vid
, so we're only finished after next_vid > target_vid
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I made changes to that logic yesterday - I went from current_vid
to next_vid
because I thought it would be clearer, but didn't catch all the places that affected. Changed
store/postgres/src/copy.rs
Outdated
self.dst.as_ref(), | ||
&self.src, | ||
self.next_vid, | ||
self.next_vid + self.batch_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not limiting by target_vid
here, I think that'd be good to make the behavior more predictable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's actually a pretty serious bug since we would end up copying data we should not be copying.
dst.revert_block(&conn, &dst.site.deployment, block_to_revert)?; | ||
Layout::revert_metadata(&conn, &dst.site.deployment, block_to_revert)?; | ||
info!(logger, "Rewound subgraph to block {}", block.number; | ||
"time_ms" => start.elapsed().as_millis()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to see more justification for this step in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've actually been meaning to get rid of this and simply forgot. It was left over from when we copied everything and then got rid of the excess because copying the complex metadata structure up to just some block was too cumbersome. I've removed that in favor of copying just what we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it turns out, reverting the data is important since it needs to unclamp versions that are deleted in the future. In theory, CopyEntityBatchQuery
could do that during copying, but the SQL to do that becomes pretty unwieldy. Maybe as an optimization in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh tricky
} | ||
None => None, | ||
}; | ||
store.start_subgraph(logger, site, graft_base) | ||
store.start_subgraph(logger, site.clone(), graft_base)?; | ||
self.primary_conn()?.copy_finished(site.as_ref()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It reads a bit weird to execute this unconditionally even though most subgraphs won't be copied when deployed, maybe this copy_finished
call can be moved into start_subgraph
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this can't go into start_subgraph
since it manipulates a different database.
Added more commits to address all the review comments. |
let start = Instant::now(); | ||
let block_to_revert: BlockNumber = (block.number + 1) | ||
.try_into() | ||
.expect("block numbers fit into an i32"); | ||
dst.revert_block(&conn, &dst.site.deployment, block_to_revert)?; | ||
Layout::revert_metadata(&conn, &dst.site.deployment, block_to_revert)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: Remember this interaction if we ever add support for removing data sources.
6bb91b5
to
42d1960
Compare
405ec3a
to
69fd7f3
Compare
I made one very small change to this PR: moving |
42d1960
to
ba77f20
Compare
ba77f20
to
46b19f4
Compare
0533d5a
to
4ff0f40
Compare
Fixed a mistake in the |
70062ae
to
d20df11
Compare
These changes make it so that the data copy for grafting is done in batches to avoid long-running transactions. It is now also permissible to graft subgraphs across shards.
Note that this PR sits on top of #2268 since it relies on its changes