Refactor messages and masking #398

little-dude · 2020-05-04T07:54:54Z

This is a first step toward performing serialization/deserialization in a separate transport layer. It is not finished, but since some changes may be controversial I figured I'd open it a bit early to start the discussion. It is easier to review this with the documentation (although this is not yet fully documented).

The diff is slightly positive but this is mostly due to the additional docstrings. ~~Otherwise, actual number of line of code went down a little~~ edit: disregard this comment, I think this is mostly due to some tests missing. The line count is roughly the same.

TODO:

Summary:

add a Header type for the common fields
add a LengthValueBuffer type to handle the variable length fields
decouple the crypto and parsing parts
add a Message type that wraps sum, update and sum2 messages
have an Owned ~~and a Borrowed~~ version of every type
small bug fixes & improvements:
- not using usize as a field since it's platform dependent
- detection of truncated local seed dictionary
- XxxBuffer does not allocate a Vec
- remove impls for specific types (impl TryFrom<Vec<u8>> for XxxBuffer, impl XxxBuffer<Vec<u8>> etc)
- make the XxxBuffer() APIs more consistent, opening the door
  to code generation via macros
- use a custom DecodeError type that contains the whole stack
  of errors, which makes debugging easier

Details:

1 Introduce a `Header` type for the common fields

All the messages share common fields. In networking protocol, these
common fields are usually handled separately from the rest of the
message. They are usually called headers, and the rest of the message
is the payload. We defined the following header:

/// A header, common to all the message
pub struct Header<CK, PK, C> {
    /// Type of message
    pub tag: Tag,
    /// Coordinator public key
    pub coordinator_pk: CK,
    /// Participant public key
    pub participant_pk: PK,
    /// A certificate that identifies the author of the message
    pub certificate: Option<C>,
}

Currently the code to handle these common fields lives in the
MessageBuffer trait which is implemented for each message type, thus
reducing boilerplate for parsing these fields. I can see several
downsides to using a trait for this though:

a. the serialize() and deserialize() methods must call the
MessageBuffer methods for each of these common fields.
b. it is difficult to handle variable length fields in the
header (which maybe is why the optional certificate is handled by
each message separately?)
c. tests require full messages

a. is not too much of a problem, and I'm not sure to what extent
b. holds true. But removing the header logic does lead to much
simpler tests.

2 Add a `LengthValueBuffer` type to handle the variable length fields

This reduces error prone logic, like we had in update.rs:

if buffer.len() >= Self::LOCAL_SEED_DICT_LEN_RANGE.end {
    buffer.certificate_range = Self::LOCAL_SEED_DICT_LEN_RANGE.end
        ..Self::LOCAL_SEED_DICT_LEN_RANGE.end
            + usize::from_le_bytes(buffer.certificate_len().try_into().unwrap());
    buffer.masked_model_range = buffer.certificate_range.end
        ..buffer.certificate_range.end
            + usize::from_le_bytes(buffer.masked_model_len().try_into().unwrap());
    buffer.local_seed_dict_range = buffer.masked_model_range.end
        ..buffer.masked_model_range.end
            + usize::from_le_bytes(buffer.local_seed_dict_len().try_into().unwrap());
}

This also allows us to automate the serialization/deserialization of
length-variable types like masks, certificates and masked models (see
impl_traits_for_length_value_types!)

3 decouple the crypto and parsing parts

The message signature and encryption is the very last/first step for
every message, and is always exactly the same, so it makes sense to
keep it separate. Therefore we removed the open() and seal()
methods for the messages themself and moved the logic to two dedicated
types: MessageSeal for signing and encrypting messages, and
MessageOpener for decrypting and verifying message signatures.

With this, we could now move the crypto logic to the transport
layer, and only handle fully fledged messages in the business logic
layer.

4 add a `Message` type that wraps sum, update and sum2 messages

pub struct Message {
    pub header: Header,
    pub payload: Payload,
}

pub enum Payload {
    Sum(SumMessage),
    Sum2(Sum2Message),
    Update(UpdateMessage),
}

This is actually needed if we want to move message
serialization/deserialization to Tokio.

By doing so, we don't have to repeat the same code for the fields that
are common to all the messages.

janpetschexain

this looks good! some minor comments below.

about two code style choicesregarding imports i noticed while reading (this is not so much related to only this mr but more in general for all of us):

can we agree on structuring the imports by blocks of standard, 3rd party and own, each separated with a blank line? this would help readability.

use std::{...};

use sodiumoxide::{...};
use derive_more::{...};

use crate::{:::};
use self::{...};

can we agree on explicit imports only? the only exception from this rule could be use super::*; for test modules. i find it quite annoying to hunt down references to source code, especially when reviewing new code and when these import trails are further obfuscated by * re-exports, so this would help readability. it also avoids unwanted name conflicts as a side effect.

rust/src/crypto/hash.rs

rust/src/lib.rs

rust/src/message/traits.rs

rust/src/message/message.rs

rust/src/message/payload/mod.rs

rust/src/message/payload/update.rs

rust/src/participant.rs

little-dude · 2020-05-05T09:17:08Z

Thanks for the review!

can we agree on explicit imports only?

Yeah I actually follow this rule, except when I want to re-export an entire sub-module in a mod.rs. I usually do:

mod inner;
pub use self::inner::*;

Is that what you're talking about specifically? I'm not against not using this pattern. I think there's even a clippy lint for it (funnily, I opened the first clippy issue for that: rust-lang/rust-clippy#1228)

can we agree on structuring the imports by blocks of standard, 3rd party and own, each separated with a blank line?

I'm a bit less inclined to commit to this because:

afaik there's no way to enforce it
~~rustfmt re-orders the imports I think (but maybe not if there's a newline in between? not sure)~~. Edit: I just checked, rusfmt doesn't re-order imports separated by a newline.
I personally don't see much value in this

We could agree on doing that on a "best effort" basis though.

little-dude · 2020-05-05T09:19:46Z

i find it quite annoying to hunt down references to source code, especially when reviewing new code and when these import trails are further obfuscated by * re-exports

Not sure what editor you use but vscode, vim and emacs allow you do jump directly to a type of function definition. I think vscode even allows you nowadays to expand these *.

Robert-Steiner

Looks good to me.

rust/src/coordinator.rs

rust/src/message/traits.rs

janpetschexain · 2020-05-05T12:19:26Z

Thanks for the review!

can we agree on explicit imports only?

Yeah I actually follow this rule, except when I want to re-export an entire sub-module in a mod.rs. I usually do:
mod inner;
pub use self::inner::*;
Is that what you're talking about specifically? I'm not against not using this pattern. I think there's even a clippy lint for it (funnily, I opened the first clippy issue for that: rust-lang/rust-clippy#1228)

yes that's basically my main issue. i usually do code reviews right from the github web interface to add comments directly, therefore following imports is not straight forward. i agree that this is a non-issue for IDEs.

can we agree on structuring the imports by blocks of standard, 3rd party and own, each separated with a blank line?

I'm a bit less inclined to commit to this because:

afaik there's no way to enforce it

~~rustfmt re-orders the imports I think (but maybe not if there's a newline in between? not sure)~~. Edit: I just checked, rusfmt doesn't re-order imports separated by a newline.

I personally don't see much value in this

We could agree on doing that on a "best effort" basis though.

related to tracing import outside of IDEs it helps having structure there, but i agree that it is hard to enforce. best effort would be a good compromise.

rust/src/message/traits.rs

finiteprods

a very thorough refactoring job! 🥇

rust/src/message/payload/update.rs

rust/src/message/message.rs

rust/src/message/payload/update.rs

rust/src/message/payload/sum.rs

rust/src/message/payload/update.rs

rust/src/message/payload/sum2.rs

janpetschexain

partial review, i hope github doesn't eat my comments.

rust/Cargo.toml

rust/src/crypto/hash.rs

rust/src/crypto/sign.rs

rust/src/lib.rs

rust/src/mask/config.rs

rust/src/mask/masking.rs

janpetschexain

i finished reviewing the mask refactoring, looks good so far! this is quite an enormous refactoring, i think i got the broad hint to submit smaller merge requests myself ;P

next to some smaller issues it seems that some of the pet logic of the coordinator and participant went missing in the process.

rust/src/mask/mask_object/mod.rs

rust/src/mask/mask_object/serialization.rs

janpetschexain · 2020-05-19T11:14:20Z

rust/src/mask/mask_object/serialization.rs

+        let mut data = writer.data_mut();
+        let bytes_per_digit = self.config.bytes_per_digit();
+
+        for int in self.data.iter() {


i would suggest to use an enumerate() in the loop, then the reassignment of the data slice in line 112 can be avoided, instead the data buffer would be indexed (which might help readability):

for (idx, int) in self.data.iter().enumerate() { let bytes = int.to_bytes_le(); data[range(idx*bytes_per_digit, bytes.len())].copy_from_slice(bytes.as_slice()); data[range(idx*bytes_per_digit+bytes.len(), (idx+1)*bytes_per_digit)].copy_from_slice([0_u8; bytes_per_digit-bytes.len()].as_ref()); }

rust/src/mask/masking.rs

rust/src/participant.rs

rust/src/coordinator.rs

Summary: ======== 1. add a `Header` type for the common fields 2. add a `LengthValueBuffer` type to handle the variable length fields 3. decouple the crypto and parsing parts 4. add a `Message` type that wraps sum, update and sum2 messages 5. have an `Owned` and a `Borrowed` version of every type 7. small bug fixes & improvements: - not using `usize` as a field since it's platform dependent - detection of truncated local seed dictionary - `XxxBuffer` does not allocate a `Vec` - remove impls for specific types (`impl TryFrom<Vec<u8>> for XxxBuffer`, `impl XxxBuffer<Vec<u8>>` etc) - make the `XxxBuffer()` APIs more consistent, opening the door to code generation via macros - use a custom `DecodeError` type that contains the whole stack of errors, which makes debugging easier Details: ======== 1. Introduce a `Header` type for the common fields -------------------------------------------------- All the messages share common fields. In networking protocol, these common fields are usually handled separately from the rest of the message. They are usually called headers, and the rest of the message is the payload. We defined the following header: ```rust /// A header, common to all the message pub struct Header<CK, PK, C> { /// Type of message pub tag: Tag, /// Coordinator public key pub coordinator_pk: CK, /// Participant public key pub participant_pk: PK, /// A certificate that identifies the author of the message pub certificate: Option<C>, } ``` Currently the code to handle these common fields lives in the `MessageBuffer` trait which is implemented for each message type, thus reducing boilerplate for parsing these fields. I can see several downsides to using a trait for this though: a. the `serialize()` and `deserialize()` methods must call the `MessageBuffer` methods for each of these common fields. b. it is difficult to handle variable length fields in the header (which maybe is why the optional certificate is handled by each message separately?) c. tests require full messages `a.` is not too much of a problem, and I'm not sure to what extent `b.` holds true. But removing the header logic does lead to much simpler tests. 2. Add a `LengthValueBuffer` type to handle the variable length fields ---------------------------------------------------------------------- This reduces error prone boilerplate, like we had in `update.rs`: ```rust if buffer.len() >= Self::LOCAL_SEED_DICT_LEN_RANGE.end { buffer.certificate_range = Self::LOCAL_SEED_DICT_LEN_RANGE.end ..Self::LOCAL_SEED_DICT_LEN_RANGE.end + usize::from_le_bytes(buffer.certificate_len().try_into().unwrap()); buffer.masked_model_range = buffer.certificate_range.end ..buffer.certificate_range.end + usize::from_le_bytes(buffer.masked_model_len().try_into().unwrap()); buffer.local_seed_dict_range = buffer.masked_model_range.end ..buffer.masked_model_range.end + usize::from_le_bytes(buffer.local_seed_dict_len().try_into().unwrap()); } ``` This also allows us to automate the serialization/deserialization of length-variable types like masks, certificates and masked models (see `impl_traits_for_length_value_types!`) 3. decouple the crypto and parsing parts ---------------------------------------- The message signature and encryption is the very last/first step for every message, and is always exactly the same, so it makes sense to keep it separate. Therefore we removed the `open()` and `seal()` methods for the messages themself and moved the logic to two dedicated types: `MessageSeal` for signing and encrypting messages, and `MessageOpener` for decrypting and verifying message signatures. With this, we could now move the crypto logic to the _transport_ layer, and only handle fully fledged messages in the business logic layer. 4. add a `Message` type that wraps sum, update and sum2 messages ---------------------------------------------------------------- ```rust pub struct Message { pub header: Header, pub payload: Payload, } pub enum Payload { Sum(SumMessage), Sum2(Sum2Message), Update(UpdateMessage), } ``` This is actually needed if we want to move message serialization/deserialization to Tokio. By doing so, we don't have to repeat the same code for the fields that are common to all the messages. 5. have an `Owned` and a `Borrowed` version of every type --------------------------------------------------------- Each type comes in two flavours: `Owned` and `Borrowed`. An `XxxOwned` type _owns_ their fields, while an `XxxBorrowed` type may only have references to some of the fields. The `Borrowed` variant is needed because we have some potentially large fields that we don't want to clone when emitting a message. For instance, it would be wasteful for an update participant sending an update message with a large local seed dictionary to clone that dictionary.

finiteprods

the latest looks very good to me (although the size and complexity of this merge means i'm not confident i've covered it all!). maybe you could finish off the merge message to summarise the changes specific to masking, when you have time later on (or write it in a separate note).

little-dude · 2020-05-25T07:22:29Z

the latest looks very good to me (although the size and complexity of this merge means i'm not confident i've covered it all!). maybe you could finish off the merge message to summarise the changes specific to masking, when you have time later on (or write it in a separate note).

Thanks for reviewing. I'll update the commit message and PR description.

finiteprods

a few further (minor) comments, just documentation points and a general question. feel free to deal with later at your leisure.

rust/src/mask/model.rs

rust/src/mask/object/serialization.rs

janpetschexain

the counting issue for Aggregation should be addressed before merging, other than that it's good to go!

rust/src/coordinator.rs

rust/src/mask/masking.rs

janpetschexain · 2020-05-25T09:56:36Z

rust/src/mask/masking.rs

+
+/// Generate a secure pseudo-random integer. Draws from a uniform distribution over the integers
+/// between zero (included) and `max_int` (excluded).
+pub fn generate_integer(prng: &mut ChaCha20Rng, max_int: &BigUint) -> BigUint {


this is still the case, we should keep it in one place.

This commit is very large, so I can't list of the changes that were made. Here are the most important ones. More compact BigUint encoding for mask objects ---------------------------------------------- Instead encoding each integer with its length, encode all the integers with on the same number of bytes. This doesn't waste too much space due to the random nature of data: the probability that we waste the `n` most significant bits is 1/2^n. Do not use generics to encode the primitive type used in the models ------------------------------------------------------------------- Using `Model<N>` forces us to chose at compile time whether our models will use f32, f64, i32 or i64. Such restriction is not acceptable for our use case. **Important**: This PR does not completely get rid of `Model<N>` yet! It only limits its reach: for instance the coordinator is not generic over `N` anymore. Do not use macros to implement getters/setters ---------------------------------------------- Getters/setters are not idiomatic in Rust. See this reddit discussion: https://www.reddit.com/r/rust/comments/6etrr1/a_derive_for_your_basic_getters_and_setters/ Make cloning of models and mask objects more explicit ----------------------------------------------------- - Use APIs that take ownership of mask object and models: masking, unmasking and aggregation are operations that _consume_ their input. If the caller want to copy the data before-hand it is their responsability - Use iterators instead of vectors, for instance for model conversion. Currently we still collect these iterators into vectors, but in the future we could try pass them around. `Iterator.map` being lazy, this could result in decent performance gains. Using iterators everywhere would still require a lot of work: currently, mask objects are vectors for instance. This commit is just a first step in that direction.

This was removed by mistake in #398

little-dude requested review from janpetschexain, Robert-Steiner and finiteprods May 4, 2020 07:56

little-dude force-pushed the refactor-message branch 3 times, most recently from 87be131 to f710ace Compare May 4, 2020 16:46

little-dude mentioned this pull request May 5, 2020

PB-610 masking #396

Merged

janpetschexain reviewed May 5, 2020

View reviewed changes

Robert-Steiner reviewed May 5, 2020

View reviewed changes

rust/src/coordinator.rs Show resolved Hide resolved

rust/src/message/traits.rs Show resolved Hide resolved

janpetschexain reviewed May 6, 2020

View reviewed changes

rust/src/message/traits.rs Outdated Show resolved Hide resolved

finiteprods reviewed May 11, 2020

View reviewed changes

little-dude force-pushed the refactor-message branch 7 times, most recently from 5239ef0 to 50b641f Compare May 19, 2020 09:26

janpetschexain reviewed May 19, 2020

View reviewed changes

janpetschexain suggested changes May 19, 2020

View reviewed changes

little-dude force-pushed the refactor-message branch 4 times, most recently from 91b3289 to c460eca Compare May 22, 2020 14:54

little-dude force-pushed the refactor-message branch 2 times, most recently from a293c2f to f33751f Compare May 25, 2020 06:01

little-dude force-pushed the refactor-message branch 4 times, most recently from 29d6e55 to 9b88ffb Compare May 25, 2020 06:33

finiteprods reviewed May 25, 2020

View reviewed changes

little-dude force-pushed the refactor-message branch from a916916 to 2a1eb37 Compare May 25, 2020 07:21

little-dude changed the title ~~Refactor messages~~ Refactor messages and masking May 25, 2020

little-dude requested a review from janpetschexain May 25, 2020 07:45

finiteprods reviewed May 25, 2020

View reviewed changes

rust/src/mask/model.rs Show resolved Hide resolved

rust/src/mask/model.rs Outdated Show resolved Hide resolved

rust/src/mask/object/serialization.rs Show resolved Hide resolved

janpetschexain suggested changes May 25, 2020

View reviewed changes

little-dude force-pushed the refactor-message branch from 2a1eb37 to d1d06f5 Compare May 25, 2020 11:51

little-dude requested review from janpetschexain and finiteprods May 25, 2020 11:52

finiteprods approved these changes May 25, 2020

View reviewed changes

janpetschexain approved these changes May 25, 2020

View reviewed changes

little-dude force-pushed the refactor-message branch from d1d06f5 to 5e957d8 Compare May 25, 2020 12:23

little-dude merged commit b515515 into xaynetwork:pet May 25, 2020

little-dude deleted the refactor-message branch May 25, 2020 12:42

little-dude added a commit that referenced this pull request May 25, 2020

re-add unmasking logic

1317c3c

This was removed by mistake in #398

little-dude mentioned this pull request May 25, 2020

re-add unmasking logic #410

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor messages and masking #398

Refactor messages and masking #398

little-dude commented May 4, 2020 •

edited by rsaffi

Loading

janpetschexain left a comment

little-dude commented May 5, 2020 •

edited

Loading

little-dude commented May 5, 2020 •

edited

Loading

Robert-Steiner left a comment

janpetschexain commented May 5, 2020

finiteprods left a comment

janpetschexain left a comment

janpetschexain left a comment

janpetschexain May 19, 2020

finiteprods left a comment

little-dude commented May 25, 2020

finiteprods left a comment

janpetschexain left a comment

janpetschexain May 25, 2020

Refactor messages and masking #398

Refactor messages and masking #398

Conversation

little-dude commented May 4, 2020 • edited by rsaffi Loading

TODO:

Summary:

Details:

1 Introduce a Header type for the common fields

2 Add a LengthValueBuffer type to handle the variable length fields

3 decouple the crypto and parsing parts

4 add a Message type that wraps sum, update and sum2 messages

janpetschexain left a comment

Choose a reason for hiding this comment

little-dude commented May 5, 2020 • edited Loading

little-dude commented May 5, 2020 • edited Loading

Robert-Steiner left a comment

Choose a reason for hiding this comment

janpetschexain commented May 5, 2020

finiteprods left a comment

Choose a reason for hiding this comment

janpetschexain left a comment

Choose a reason for hiding this comment

janpetschexain left a comment

Choose a reason for hiding this comment

janpetschexain May 19, 2020

Choose a reason for hiding this comment

finiteprods left a comment

Choose a reason for hiding this comment

little-dude commented May 25, 2020

finiteprods left a comment

Choose a reason for hiding this comment

janpetschexain left a comment

Choose a reason for hiding this comment

janpetschexain May 25, 2020

Choose a reason for hiding this comment

little-dude commented May 4, 2020 •

edited by rsaffi

Loading

1 Introduce a `Header` type for the common fields

2 Add a `LengthValueBuffer` type to handle the variable length fields

4 add a `Message` type that wraps sum, update and sum2 messages

little-dude commented May 5, 2020 •

edited

Loading

little-dude commented May 5, 2020 •

edited

Loading