-
Notifications
You must be signed in to change notification settings - Fork 28
Conversation
87be131
to
f710ace
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good! some minor comments below.
about two code style choicesregarding imports i noticed while reading (this is not so much related to only this mr but more in general for all of us):
- can we agree on structuring the imports by blocks of standard, 3rd party and own, each separated with a blank line? this would help readability.
use std::{...};
use sodiumoxide::{...};
use derive_more::{...};
use crate::{:::};
use self::{...};
- can we agree on explicit imports only? the only exception from this rule could be
use super::*;
for test modules. i find it quite annoying to hunt down references to source code, especially when reviewing new code and when these import trails are further obfuscated by*
re-exports, so this would help readability. it also avoids unwanted name conflicts as a side effect.
Thanks for the review!
Yeah I actually follow this rule, except when I want to re-export an entire sub-module in a mod inner;
pub use self::inner::*; Is that what you're talking about specifically? I'm not against not using this pattern. I think there's even a clippy lint for it (funnily, I opened the first clippy issue for that: rust-lang/rust-clippy#1228)
I'm a bit less inclined to commit to this because:
We could agree on doing that on a "best effort" basis though. |
Not sure what editor you use but vscode, vim and emacs allow you do jump directly to a type of function definition. I think vscode even allows you nowadays to expand these |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
yes that's basically my main issue. i usually do code reviews right from the github web interface to add comments directly, therefore following imports is not straight forward. i agree that this is a non-issue for IDEs.
related to tracing import outside of IDEs it helps having structure there, but i agree that it is hard to enforce. best effort would be a good compromise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a very thorough refactoring job! 🥇
5239ef0
to
50b641f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review, i hope github doesn't eat my comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i finished reviewing the mask refactoring, looks good so far! this is quite an enormous refactoring, i think i got the broad hint to submit smaller merge requests myself ;P
next to some smaller issues it seems that some of the pet logic of the coordinator and participant went missing in the process.
let mut data = writer.data_mut(); | ||
let bytes_per_digit = self.config.bytes_per_digit(); | ||
|
||
for int in self.data.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would suggest to use an enumerate()
in the loop, then the reassignment of the data
slice in line 112 can be avoided, instead the data
buffer would be indexed (which might help readability):
for (idx, int) in self.data.iter().enumerate() {
let bytes = int.to_bytes_le();
data[range(idx*bytes_per_digit, bytes.len())].copy_from_slice(bytes.as_slice());
data[range(idx*bytes_per_digit+bytes.len(), (idx+1)*bytes_per_digit)].copy_from_slice([0_u8; bytes_per_digit-bytes.len()].as_ref());
}
91b3289
to
c460eca
Compare
Summary: ======== 1. add a `Header` type for the common fields 2. add a `LengthValueBuffer` type to handle the variable length fields 3. decouple the crypto and parsing parts 4. add a `Message` type that wraps sum, update and sum2 messages 5. have an `Owned` and a `Borrowed` version of every type 7. small bug fixes & improvements: - not using `usize` as a field since it's platform dependent - detection of truncated local seed dictionary - `XxxBuffer` does not allocate a `Vec` - remove impls for specific types (`impl TryFrom<Vec<u8>> for XxxBuffer`, `impl XxxBuffer<Vec<u8>>` etc) - make the `XxxBuffer()` APIs more consistent, opening the door to code generation via macros - use a custom `DecodeError` type that contains the whole stack of errors, which makes debugging easier Details: ======== 1. Introduce a `Header` type for the common fields -------------------------------------------------- All the messages share common fields. In networking protocol, these common fields are usually handled separately from the rest of the message. They are usually called headers, and the rest of the message is the payload. We defined the following header: ```rust /// A header, common to all the message pub struct Header<CK, PK, C> { /// Type of message pub tag: Tag, /// Coordinator public key pub coordinator_pk: CK, /// Participant public key pub participant_pk: PK, /// A certificate that identifies the author of the message pub certificate: Option<C>, } ``` Currently the code to handle these common fields lives in the `MessageBuffer` trait which is implemented for each message type, thus reducing boilerplate for parsing these fields. I can see several downsides to using a trait for this though: a. the `serialize()` and `deserialize()` methods must call the `MessageBuffer` methods for each of these common fields. b. it is difficult to handle variable length fields in the header (which maybe is why the optional certificate is handled by each message separately?) c. tests require full messages `a.` is not too much of a problem, and I'm not sure to what extent `b.` holds true. But removing the header logic does lead to much simpler tests. 2. Add a `LengthValueBuffer` type to handle the variable length fields ---------------------------------------------------------------------- This reduces error prone boilerplate, like we had in `update.rs`: ```rust if buffer.len() >= Self::LOCAL_SEED_DICT_LEN_RANGE.end { buffer.certificate_range = Self::LOCAL_SEED_DICT_LEN_RANGE.end ..Self::LOCAL_SEED_DICT_LEN_RANGE.end + usize::from_le_bytes(buffer.certificate_len().try_into().unwrap()); buffer.masked_model_range = buffer.certificate_range.end ..buffer.certificate_range.end + usize::from_le_bytes(buffer.masked_model_len().try_into().unwrap()); buffer.local_seed_dict_range = buffer.masked_model_range.end ..buffer.masked_model_range.end + usize::from_le_bytes(buffer.local_seed_dict_len().try_into().unwrap()); } ``` This also allows us to automate the serialization/deserialization of length-variable types like masks, certificates and masked models (see `impl_traits_for_length_value_types!`) 3. decouple the crypto and parsing parts ---------------------------------------- The message signature and encryption is the very last/first step for every message, and is always exactly the same, so it makes sense to keep it separate. Therefore we removed the `open()` and `seal()` methods for the messages themself and moved the logic to two dedicated types: `MessageSeal` for signing and encrypting messages, and `MessageOpener` for decrypting and verifying message signatures. With this, we could now move the crypto logic to the _transport_ layer, and only handle fully fledged messages in the business logic layer. 4. add a `Message` type that wraps sum, update and sum2 messages ---------------------------------------------------------------- ```rust pub struct Message { pub header: Header, pub payload: Payload, } pub enum Payload { Sum(SumMessage), Sum2(Sum2Message), Update(UpdateMessage), } ``` This is actually needed if we want to move message serialization/deserialization to Tokio. By doing so, we don't have to repeat the same code for the fields that are common to all the messages. 5. have an `Owned` and a `Borrowed` version of every type --------------------------------------------------------- Each type comes in two flavours: `Owned` and `Borrowed`. An `XxxOwned` type _owns_ their fields, while an `XxxBorrowed` type may only have references to some of the fields. The `Borrowed` variant is needed because we have some potentially large fields that we don't want to clone when emitting a message. For instance, it would be wasteful for an update participant sending an update message with a large local seed dictionary to clone that dictionary.
a293c2f
to
f33751f
Compare
29d6e55
to
9b88ffb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the latest looks very good to me (although the size and complexity of this merge means i'm not confident i've covered it all!). maybe you could finish off the merge message to summarise the changes specific to masking, when you have time later on (or write it in a separate note).
a916916
to
2a1eb37
Compare
Thanks for reviewing. I'll update the commit message and PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few further (minor) comments, just documentation points and a general question. feel free to deal with later at your leisure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the counting issue for Aggregation
should be addressed before merging, other than that it's good to go!
rust/src/mask/masking.rs
Outdated
|
||
/// Generate a secure pseudo-random integer. Draws from a uniform distribution over the integers | ||
/// between zero (included) and `max_int` (excluded). | ||
pub fn generate_integer(prng: &mut ChaCha20Rng, max_int: &BigUint) -> BigUint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still the case, we should keep it in one place.
2a1eb37
to
d1d06f5
Compare
This commit is very large, so I can't list of the changes that were made. Here are the most important ones. More compact BigUint encoding for mask objects ---------------------------------------------- Instead encoding each integer with its length, encode all the integers with on the same number of bytes. This doesn't waste too much space due to the random nature of data: the probability that we waste the `n` most significant bits is 1/2^n. Do not use generics to encode the primitive type used in the models ------------------------------------------------------------------- Using `Model<N>` forces us to chose at compile time whether our models will use f32, f64, i32 or i64. Such restriction is not acceptable for our use case. **Important**: This PR does not completely get rid of `Model<N>` yet! It only limits its reach: for instance the coordinator is not generic over `N` anymore. Do not use macros to implement getters/setters ---------------------------------------------- Getters/setters are not idiomatic in Rust. See this reddit discussion: https://www.reddit.com/r/rust/comments/6etrr1/a_derive_for_your_basic_getters_and_setters/ Make cloning of models and mask objects more explicit ----------------------------------------------------- - Use APIs that take ownership of mask object and models: masking, unmasking and aggregation are operations that _consume_ their input. If the caller want to copy the data before-hand it is their responsability - Use iterators instead of vectors, for instance for model conversion. Currently we still collect these iterators into vectors, but in the future we could try pass them around. `Iterator.map` being lazy, this could result in decent performance gains. Using iterators everywhere would still require a lot of work: currently, mask objects are vectors for instance. This commit is just a first step in that direction.
d1d06f5
to
5e957d8
Compare
This was removed by mistake in #398
This is a first step toward performing serialization/deserialization in a separate transport layer. It is not finished, but since some changes may be controversial I figured I'd open it a bit early to start the discussion. It is easier to review this with the documentation (although this is not yet fully documented).
The diff is slightly positive but this is mostly due to the additional docstrings.
Otherwise, actual number of line of code went down a littleedit: disregard this comment, I think this is mostly due to some tests missing. The line count is roughly the same.TODO:
Header
tests and docMessageBuffer
tests and docPayload
tests and docSummary:
Header
type for the common fieldsLengthValueBuffer
type to handle the variable length fieldsMessage
type that wraps sum, update and sum2 messagesOwned
and aversion of every typeBorrowed
usize
as a field since it's platform dependentXxxBuffer
does not allocate aVec
impl TryFrom<Vec<u8>> for XxxBuffer
,impl XxxBuffer<Vec<u8>>
etc)XxxBuffer()
APIs more consistent, opening the doorto code generation via macros
DecodeError
type that contains the whole stackof errors, which makes debugging easier
Details:
1 Introduce a
Header
type for the common fieldsAll the messages share common fields. In networking protocol, these
common fields are usually handled separately from the rest of the
message. They are usually called headers, and the rest of the message
is the payload. We defined the following header:
Currently the code to handle these common fields lives in the
MessageBuffer
trait which is implemented for each message type, thusreducing boilerplate for parsing these fields. I can see several
downsides to using a trait for this though:
a. the
serialize()
anddeserialize()
methods must call theMessageBuffer
methods for each of these common fields.b. it is difficult to handle variable length fields in the
header (which maybe is why the optional certificate is handled by
each message separately?)
c. tests require full messages
a.
is not too much of a problem, and I'm not sure to what extentb.
holds true. But removing the header logic does lead to muchsimpler tests.
2 Add a
LengthValueBuffer
type to handle the variable length fieldsThis reduces error prone logic, like we had in
update.rs
:This also allows us to automate the serialization/deserialization of
length-variable types like masks, certificates and masked models (see
impl_traits_for_length_value_types!
)3 decouple the crypto and parsing parts
The message signature and encryption is the very last/first step for
every message, and is always exactly the same, so it makes sense to
keep it separate. Therefore we removed the
open()
andseal()
methods for the messages themself and moved the logic to two dedicated
types:
MessageSeal
for signing and encrypting messages, andMessageOpener
for decrypting and verifying message signatures.With this, we could now move the crypto logic to the transport
layer, and only handle fully fledged messages in the business logic
layer.
4 add a
Message
type that wraps sum, update and sum2 messagesThis is actually needed if we want to move message
serialization/deserialization to Tokio.
By doing so, we don't have to repeat the same code for the fields that
are common to all the messages.