-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fastx client] Adding robust client <-> authorities primitives #336
Conversation
9bb93c5
to
14bf67a
Compare
Hey @patrickkuo -- I promised to make this available for review, but I am still trying to finalise the |
662edd5
to
3643466
Compare
5d8ca5c
to
dbdf6b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some nits, otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks like an interesting direction. A couple of comments with this first pas of review:
- unit tests for
get_object_by_id
,get_all_owned_objects
,sync_all_owned_objects
would help .. or at least smaller tests. The tests that are here (thank you!) are integration tests. They're a journey, I would love something more bite-sized for daily reading. - anything that lowers the complexity of
quorum_map_then_reduce_with_timeout
is good. I left a few suggestions, but I'll try to help more on the 2d pass, especially with thereduce
fastpay_core/src/client.rs
Outdated
) | ||
.await; | ||
|
||
// TODO: log or report the errors if there is one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracing
allows you to de-couple the logging from the publishing of the logs, which means you can leave it to a further PR to actually figure out how to display things, and actually get on with figuring out interesting messages to communicate.
I wonder: is there something we could do to make the bar to entry easier and make sure logging / tracing is not left as a TODO in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracing communicates things to the devops person baby sitting a server, or user process (actually logs on the user side are usually for debug, no user should look there). Here what needs doing is collect errors, and send them to the part of the client that can do something about them: ie deprioritize interactions with this authority, etc.
I changed the comment to:
// TODO: collect errors and propagate them to the right place
Your reflection on how to systematically log is still very valid.
fastpay_core/src/client.rs
Outdated
/// This function provides a flexible way to communicate with a quorum of authorities, processing and | ||
/// processing their results into a safe overall result, and also safely allowing operations to continue | ||
/// past the quorum to ensure all authorities are up to date (up to a timeout). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A doc example here would increase the usability of the function, especially if it shows how to make use of timeouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now have quite a few examples in the unit tests, including using timeouts. We can try to adapt them to a doc example (I do not know how to set up the scaffolding as we do in tests for docs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scaffolding is there already, it's a Rust built-in, cargo test
will run your doc tests at the end.
Many thanks for the reviews @patrickkuo & @huitseeker -- you have identified some key issues to fix. Will do and respond by tomorrow. |
fastpay_core/src/client.rs
Outdated
( | ||
BTreeMap< | ||
(ObjectRef, TransactionDigest), | ||
(Option<Object>, Vec<(AuthorityName, Option<SignedOrder>)>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to pass around the reference to the Object
here instead of the whole object since they could be arbitrarily huge.
(in the meantime, Sam and I are trying to define a reasonable upper bound for object sizes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The objects can be huge because they contain large allocations in the heap. However, moving an Object around (except if we clone or if it implements the copy trait) does not actually copy the allocation, but just the fixed size stack allocated part I think.
3e36713
to
3f9362e
Compare
@huitseeker I have added unit tests as you suggested, but for Do we have experience, or should we use something like: https://docs.rs/mockall/latest/mockall/ ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be an issue here:
When we delete an object by wrapping it to another object, we insert (obj_id, seq+1, ..) to parent_sync
, but we don't actually mutate the sequence number of the object (it stays at seq). When that same object gets unwrapped, its sequence number becomes seq+1 and now parent_sync
has two entries for the same object_id + seq.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To @lxfind's comment: do we not clobber the original deleted object recording when unwrapping?
fastpay_core/src/client.rs
Outdated
if let Ok(response) = result { | ||
let certificate = response | ||
.parent_certificate | ||
.expect("Unable to get certificate"); | ||
let certificate = if let Some(certificate) = response.parent_certificate { | ||
certificate | ||
} else { | ||
continue; | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if let Ok(ObjectInfoResponse {
parent_certificate: Some(certificate),
..
}) = result
{
fastpay_core/src/client.rs
Outdated
// NOTE: This implies this is a genesis object. We should check that it is. | ||
// We can do this by looking into the genesis, or the object_refs of the genesis. | ||
// Otherwise report the authority as potentially faulty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably calling for a specific is_genesis
function somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it does. It also calls for checking all information the client downloads from an authority, something that is done in at best an ah-hoc way right now, if at all.
fastpay_core/src/client.rs
Outdated
if total_stake.1 > validity { | ||
return Err(FastPayError::TooManyIncorrectAuthorities); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very underwhelming. 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I do not know what to do about this. If its the name that is underwhelming, we can change it to something else. If the amount of reporting is underwhelming, then it will have to be a separate PR. We cannot block all progress on the known issue that error reporting on the client is lacking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for sure, there's no blocking issue here.
Remove annotation Implemented get_object_by_ref Download objects at version Added tests and fix corner cases Return certificate with retrieved object Added docs Finished sync_all_owned_objects Docs Return available objects Add an assert that we only side insert genesis objects Added a an authority test with a deleted object Add deleted objects to the parent index Added get_latest_parent_entry & tests ObjectInfoResponse sends the latest certificate Added requested_object_reference to ObjectInfoResponse Fixed bug in query() Fix sync_all_owned_objects Added tests for the client full sync Make fmt and clippy happy Clean up comments Removed unused function Make fmt happy Add docs Clean up tests Removed wrapped Remove s.truncate Fixed rebase errors Make get_object_by_id return pending orders clippy + fmt Changes followin review Added unit tests for map/reducer
7b56a00
to
ee0d7f7
Compare
Yes, this issue is very real, and either we have to deal with it by incrementing the version correctly, or implementing the lamport timestamps. Either should work. |
In this PR we build a library of primitives for robust interactions between the client and the set of authorities, covering failure cases in the general model of orders beyond transfers.
Specifically:
[Client logic]
[Authority logic]
parent_sync
when the object is deleted, and a link back to the cert that deleted it.ObjectRef
for an object id, and the certificate that leads to it, including for deleted objects.For another PR: