Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

Refactor messages and masking #398

Merged
merged 2 commits into from
May 25, 2020

Conversation

little-dude
Copy link
Contributor

@little-dude little-dude commented May 4, 2020

This is a first step toward performing serialization/deserialization in a separate transport layer. It is not finished, but since some changes may be controversial I figured I'd open it a bit early to start the discussion. It is easier to review this with the documentation (although this is not yet fully documented).

The diff is slightly positive but this is mostly due to the additional docstrings. Otherwise, actual number of line of code went down a little edit: disregard this comment, I think this is mostly due to some tests missing. The line count is roughly the same.

TODO:

  • Header tests and doc
  • MessageBuffer tests and doc
  • Payload tests and doc
  • fuzzing
  • benchmarks

Summary:

  1. add a Header type for the common fields
  2. add a LengthValueBuffer type to handle the variable length fields
  3. decouple the crypto and parsing parts
  4. add a Message type that wraps sum, update and sum2 messages
  5. have an Owned and a Borrowed version of every type
  6. small bug fixes & improvements:
    • not using usize as a field since it's platform dependent
    • detection of truncated local seed dictionary
    • XxxBuffer does not allocate a Vec
    • remove impls for specific types (impl TryFrom<Vec<u8>> for XxxBuffer, impl XxxBuffer<Vec<u8>> etc)
    • make the XxxBuffer() APIs more consistent, opening the door
      to code generation via macros
    • use a custom DecodeError type that contains the whole stack
      of errors, which makes debugging easier

Details:

1 Introduce a Header type for the common fields

All the messages share common fields. In networking protocol, these
common fields are usually handled separately from the rest of the
message. They are usually called headers, and the rest of the message
is the payload. We defined the following header:

/// A header, common to all the message
pub struct Header<CK, PK, C> {
    /// Type of message
    pub tag: Tag,
    /// Coordinator public key
    pub coordinator_pk: CK,
    /// Participant public key
    pub participant_pk: PK,
    /// A certificate that identifies the author of the message
    pub certificate: Option<C>,
}

Currently the code to handle these common fields lives in the
MessageBuffer trait which is implemented for each message type, thus
reducing boilerplate for parsing these fields. I can see several
downsides to using a trait for this though:

a. the serialize() and deserialize() methods must call the
MessageBuffer methods for each of these common fields.
b. it is difficult to handle variable length fields in the
header (which maybe is why the optional certificate is handled by
each message separately?)
c. tests require full messages

a. is not too much of a problem, and I'm not sure to what extent
b. holds true. But removing the header logic does lead to much
simpler tests.

2 Add a LengthValueBuffer type to handle the variable length fields

This reduces error prone logic, like we had in update.rs:

if buffer.len() >= Self::LOCAL_SEED_DICT_LEN_RANGE.end {
    buffer.certificate_range = Self::LOCAL_SEED_DICT_LEN_RANGE.end
        ..Self::LOCAL_SEED_DICT_LEN_RANGE.end
            + usize::from_le_bytes(buffer.certificate_len().try_into().unwrap());
    buffer.masked_model_range = buffer.certificate_range.end
        ..buffer.certificate_range.end
            + usize::from_le_bytes(buffer.masked_model_len().try_into().unwrap());
    buffer.local_seed_dict_range = buffer.masked_model_range.end
        ..buffer.masked_model_range.end
            + usize::from_le_bytes(buffer.local_seed_dict_len().try_into().unwrap());
}

This also allows us to automate the serialization/deserialization of
length-variable types like masks, certificates and masked models (see
impl_traits_for_length_value_types!)

3 decouple the crypto and parsing parts

The message signature and encryption is the very last/first step for
every message, and is always exactly the same, so it makes sense to
keep it separate. Therefore we removed the open() and seal()
methods for the messages themself and moved the logic to two dedicated
types: MessageSeal for signing and encrypting messages, and
MessageOpener for decrypting and verifying message signatures.

With this, we could now move the crypto logic to the transport
layer, and only handle fully fledged messages in the business logic
layer.

4 add a Message type that wraps sum, update and sum2 messages

pub struct Message {
    pub header: Header,
    pub payload: Payload,
}

pub enum Payload {
    Sum(SumMessage),
    Sum2(Sum2Message),
    Update(UpdateMessage),
}

This is actually needed if we want to move message
serialization/deserialization to Tokio.

By doing so, we don't have to repeat the same code for the fields that
are common to all the messages.

@little-dude little-dude force-pushed the refactor-message branch 3 times, most recently from 87be131 to f710ace Compare May 4, 2020 16:46
@little-dude little-dude mentioned this pull request May 5, 2020
Copy link
Contributor

@janpetschexain janpetschexain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good! some minor comments below.

about two code style choicesregarding imports i noticed while reading (this is not so much related to only this mr but more in general for all of us):

  • can we agree on structuring the imports by blocks of standard, 3rd party and own, each separated with a blank line? this would help readability.
use std::{...};

use sodiumoxide::{...};
use derive_more::{...};

use crate::{:::};
use self::{...};
  • can we agree on explicit imports only? the only exception from this rule could be use super::*; for test modules. i find it quite annoying to hunt down references to source code, especially when reviewing new code and when these import trails are further obfuscated by * re-exports, so this would help readability. it also avoids unwanted name conflicts as a side effect.

rust/src/crypto/hash.rs Outdated Show resolved Hide resolved
rust/src/lib.rs Show resolved Hide resolved
rust/src/message/traits.rs Show resolved Hide resolved
rust/src/message/message.rs Show resolved Hide resolved
rust/src/message/payload/mod.rs Outdated Show resolved Hide resolved
rust/src/message/payload/update.rs Show resolved Hide resolved
rust/src/participant.rs Show resolved Hide resolved
@little-dude
Copy link
Contributor Author

little-dude commented May 5, 2020

Thanks for the review!

can we agree on explicit imports only?

Yeah I actually follow this rule, except when I want to re-export an entire sub-module in a mod.rs. I usually do:

mod inner;
pub use self::inner::*;

Is that what you're talking about specifically? I'm not against not using this pattern. I think there's even a clippy lint for it (funnily, I opened the first clippy issue for that: rust-lang/rust-clippy#1228)

can we agree on structuring the imports by blocks of standard, 3rd party and own, each separated with a blank line?

I'm a bit less inclined to commit to this because:

  • afaik there's no way to enforce it
  • rustfmt re-orders the imports I think (but maybe not if there's a newline in between? not sure). Edit: I just checked, rusfmt doesn't re-order imports separated by a newline.
  • I personally don't see much value in this

We could agree on doing that on a "best effort" basis though.

@little-dude
Copy link
Contributor Author

little-dude commented May 5, 2020

i find it quite annoying to hunt down references to source code, especially when reviewing new code and when these import trails are further obfuscated by * re-exports

Not sure what editor you use but vscode, vim and emacs allow you do jump directly to a type of function definition. I think vscode even allows you nowadays to expand these *.

Copy link
Contributor

@Robert-Steiner Robert-Steiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

rust/src/coordinator.rs Show resolved Hide resolved
rust/src/message/traits.rs Show resolved Hide resolved
@janpetschexain
Copy link
Contributor

Thanks for the review!

can we agree on explicit imports only?

Yeah I actually follow this rule, except when I want to re-export an entire sub-module in a mod.rs. I usually do:

mod inner;
pub use self::inner::*;

Is that what you're talking about specifically? I'm not against not using this pattern. I think there's even a clippy lint for it (funnily, I opened the first clippy issue for that: rust-lang/rust-clippy#1228)

yes that's basically my main issue. i usually do code reviews right from the github web interface to add comments directly, therefore following imports is not straight forward. i agree that this is a non-issue for IDEs.

can we agree on structuring the imports by blocks of standard, 3rd party and own, each separated with a blank line?

I'm a bit less inclined to commit to this because:

  • afaik there's no way to enforce it
  • rustfmt re-orders the imports I think (but maybe not if there's a newline in between? not sure). Edit: I just checked, rusfmt doesn't re-order imports separated by a newline.
  • I personally don't see much value in this

We could agree on doing that on a "best effort" basis though.

related to tracing import outside of IDEs it helps having structure there, but i agree that it is hard to enforce. best effort would be a good compromise.

Copy link
Contributor

@finiteprods finiteprods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a very thorough refactoring job! 🥇

rust/src/message/payload/update.rs Outdated Show resolved Hide resolved
rust/src/message/payload/update.rs Outdated Show resolved Hide resolved
rust/src/message/message.rs Outdated Show resolved Hide resolved
rust/src/message/message.rs Show resolved Hide resolved
rust/src/message/message.rs Outdated Show resolved Hide resolved
rust/src/message/payload/update.rs Outdated Show resolved Hide resolved
rust/src/message/payload/sum.rs Outdated Show resolved Hide resolved
rust/src/message/payload/update.rs Outdated Show resolved Hide resolved
rust/src/message/payload/sum2.rs Outdated Show resolved Hide resolved
rust/src/message/payload/sum2.rs Outdated Show resolved Hide resolved
@little-dude little-dude force-pushed the refactor-message branch 7 times, most recently from 5239ef0 to 50b641f Compare May 19, 2020 09:26
Copy link
Contributor

@janpetschexain janpetschexain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review, i hope github doesn't eat my comments.

rust/Cargo.toml Outdated Show resolved Hide resolved
rust/src/crypto/hash.rs Outdated Show resolved Hide resolved
rust/src/crypto/sign.rs Outdated Show resolved Hide resolved
rust/src/lib.rs Outdated Show resolved Hide resolved
rust/src/mask/config.rs Outdated Show resolved Hide resolved
rust/src/mask/config.rs Outdated Show resolved Hide resolved
rust/src/mask/config.rs Outdated Show resolved Hide resolved
rust/src/mask/config.rs Outdated Show resolved Hide resolved
rust/src/mask/masking.rs Outdated Show resolved Hide resolved
rust/src/mask/masking.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@janpetschexain janpetschexain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i finished reviewing the mask refactoring, looks good so far! this is quite an enormous refactoring, i think i got the broad hint to submit smaller merge requests myself ;P

next to some smaller issues it seems that some of the pet logic of the coordinator and participant went missing in the process.

rust/src/mask/mask_object/mod.rs Outdated Show resolved Hide resolved
rust/src/mask/mask_object/serialization.rs Outdated Show resolved Hide resolved
rust/src/mask/mask_object/serialization.rs Outdated Show resolved Hide resolved
let mut data = writer.data_mut();
let bytes_per_digit = self.config.bytes_per_digit();

for int in self.data.iter() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would suggest to use an enumerate() in the loop, then the reassignment of the data slice in line 112 can be avoided, instead the data buffer would be indexed (which might help readability):

for (idx, int) in self.data.iter().enumerate() {
    let bytes = int.to_bytes_le();
    data[range(idx*bytes_per_digit, bytes.len())].copy_from_slice(bytes.as_slice());
    data[range(idx*bytes_per_digit+bytes.len(), (idx+1)*bytes_per_digit)].copy_from_slice([0_u8; bytes_per_digit-bytes.len()].as_ref());
}

rust/src/mask/masking.rs Outdated Show resolved Hide resolved
rust/src/participant.rs Outdated Show resolved Hide resolved
rust/src/coordinator.rs Show resolved Hide resolved
rust/src/coordinator.rs Show resolved Hide resolved
rust/src/coordinator.rs Show resolved Hide resolved
rust/src/coordinator.rs Show resolved Hide resolved
@little-dude little-dude force-pushed the refactor-message branch 4 times, most recently from 91b3289 to c460eca Compare May 22, 2020 14:54
Summary:
========

1. add a `Header` type for the common fields
2. add a `LengthValueBuffer` type to handle the variable length fields
3. decouple the crypto and parsing parts
4. add a `Message` type that wraps sum, update and sum2 messages
5. have an `Owned` and a `Borrowed` version of every type
7. small bug fixes & improvements:
     - not using `usize` as a field since it's platform dependent
     - detection of truncated local seed dictionary
     - `XxxBuffer` does not allocate a `Vec`
     - remove impls for specific types (`impl TryFrom<Vec<u8>> for
       XxxBuffer`, `impl XxxBuffer<Vec<u8>>` etc)
     - make the `XxxBuffer()` APIs more consistent, opening the door
       to code generation via macros
     - use a custom `DecodeError` type that contains the whole stack
       of errors, which makes debugging easier

Details:
========

1. Introduce a `Header` type for the common fields
--------------------------------------------------

All the messages share common fields. In networking protocol, these
common fields are usually handled separately from the rest of the
message. They are usually called headers, and the rest of the message
is the payload. We defined the following header:

```rust
/// A header, common to all the message
pub struct Header<CK, PK, C> {
    /// Type of message
    pub tag: Tag,
    /// Coordinator public key
    pub coordinator_pk: CK,
    /// Participant public key
    pub participant_pk: PK,
    /// A certificate that identifies the author of the message
    pub certificate: Option<C>,
}
```

Currently the code to handle these common fields lives in the
`MessageBuffer` trait which is implemented for each message type, thus
reducing boilerplate for parsing these fields. I can see several
downsides to using a trait for this though:

a. the `serialize()` and `deserialize()` methods must call the
   `MessageBuffer` methods for each of these common fields.
b. it is difficult to handle variable length fields in the
   header (which maybe is why the optional certificate is handled by
   each message separately?)
c. tests require full messages

`a.` is not too much of a problem, and I'm not sure to what extent
`b.` holds true. But removing the header logic does lead to much
simpler tests.

2. Add a `LengthValueBuffer` type to handle the variable length fields
----------------------------------------------------------------------

This reduces error prone boilerplate, like we had in `update.rs`:

```rust
if buffer.len() >= Self::LOCAL_SEED_DICT_LEN_RANGE.end {
    buffer.certificate_range = Self::LOCAL_SEED_DICT_LEN_RANGE.end
        ..Self::LOCAL_SEED_DICT_LEN_RANGE.end
            + usize::from_le_bytes(buffer.certificate_len().try_into().unwrap());
    buffer.masked_model_range = buffer.certificate_range.end
        ..buffer.certificate_range.end
            + usize::from_le_bytes(buffer.masked_model_len().try_into().unwrap());
    buffer.local_seed_dict_range = buffer.masked_model_range.end
        ..buffer.masked_model_range.end
            + usize::from_le_bytes(buffer.local_seed_dict_len().try_into().unwrap());
}
```

This also allows us to automate the serialization/deserialization of
length-variable types like masks, certificates and masked models (see
`impl_traits_for_length_value_types!`)

3. decouple the crypto and parsing parts
----------------------------------------

The message signature and encryption is the very last/first step for
every message, and is always exactly the same, so it makes sense to
keep it separate. Therefore we removed the `open()` and `seal()`
methods for the messages themself and moved the logic to two dedicated
types: `MessageSeal` for signing and encrypting messages, and
`MessageOpener` for decrypting and verifying message signatures.

With this, we could now move the crypto logic to the _transport_
layer, and only handle fully fledged messages in the business logic
layer.

4. add a `Message` type that wraps sum, update and sum2 messages
----------------------------------------------------------------

```rust
pub struct Message {
    pub header: Header,
    pub payload: Payload,
}

pub enum Payload {
    Sum(SumMessage),
    Sum2(Sum2Message),
    Update(UpdateMessage),
}
```

This is actually needed if we want to move message
serialization/deserialization to Tokio.

By doing so, we don't have to repeat the same code for the fields that
are common to all the messages.

5. have an `Owned` and a `Borrowed` version of every type
---------------------------------------------------------

Each type comes in two flavours: `Owned` and `Borrowed`. An `XxxOwned`
type _owns_ their fields, while an `XxxBorrowed` type may only have
references to some of the fields.

The `Borrowed` variant is needed because we have some potentially
large fields that we don't want to clone when emitting a message. For
instance, it would be wasteful for an update participant sending an
update message with a large local seed dictionary to clone that
dictionary.
@little-dude little-dude force-pushed the refactor-message branch 2 times, most recently from a293c2f to f33751f Compare May 25, 2020 06:01
@little-dude little-dude force-pushed the refactor-message branch 4 times, most recently from 29d6e55 to 9b88ffb Compare May 25, 2020 06:33
Copy link
Contributor

@finiteprods finiteprods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the latest looks very good to me (although the size and complexity of this merge means i'm not confident i've covered it all!). maybe you could finish off the merge message to summarise the changes specific to masking, when you have time later on (or write it in a separate note).

@little-dude little-dude changed the title Refactor messages Refactor messages and masking May 25, 2020
@little-dude
Copy link
Contributor Author

the latest looks very good to me (although the size and complexity of this merge means i'm not confident i've covered it all!). maybe you could finish off the merge message to summarise the changes specific to masking, when you have time later on (or write it in a separate note).

Thanks for reviewing. I'll update the commit message and PR description.

Copy link
Contributor

@finiteprods finiteprods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few further (minor) comments, just documentation points and a general question. feel free to deal with later at your leisure.

rust/src/mask/model.rs Show resolved Hide resolved
rust/src/mask/model.rs Outdated Show resolved Hide resolved
rust/src/mask/object/serialization.rs Show resolved Hide resolved
Copy link
Contributor

@janpetschexain janpetschexain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the counting issue for Aggregation should be addressed before merging, other than that it's good to go!

rust/src/coordinator.rs Outdated Show resolved Hide resolved
rust/src/coordinator.rs Outdated Show resolved Hide resolved
rust/src/mask/masking.rs Outdated Show resolved Hide resolved
rust/src/mask/masking.rs Outdated Show resolved Hide resolved
rust/src/mask/masking.rs Outdated Show resolved Hide resolved

/// Generate a secure pseudo-random integer. Draws from a uniform distribution over the integers
/// between zero (included) and `max_int` (excluded).
pub fn generate_integer(prng: &mut ChaCha20Rng, max_int: &BigUint) -> BigUint {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still the case, we should keep it in one place.

This commit is very large, so I can't list of the changes that were
made. Here are the most important ones.

More compact BigUint encoding for mask objects
----------------------------------------------

Instead encoding each integer with its length, encode all the integers
with on the same number of bytes. This doesn't waste too much space
due to the random nature of data: the probability that we waste the
`n` most significant bits is 1/2^n.

Do not use generics to encode the primitive type used in the models
-------------------------------------------------------------------

Using `Model<N>` forces us to chose at compile time whether our models
will use f32, f64, i32 or i64. Such restriction is not acceptable for
our use case.

**Important**: This PR does not completely get rid of `Model<N>` yet!
It only limits its reach: for instance the coordinator is not generic
over `N` anymore.

Do not use macros to implement getters/setters
----------------------------------------------

Getters/setters are not idiomatic in Rust. See this reddit discussion:
https://www.reddit.com/r/rust/comments/6etrr1/a_derive_for_your_basic_getters_and_setters/

Make cloning of models and mask objects more explicit
-----------------------------------------------------

- Use APIs that take ownership of mask object and models: masking,
  unmasking and aggregation are operations that _consume_ their
  input. If the caller want to copy the data before-hand it is their
  responsability
- Use iterators instead of vectors, for instance for model
  conversion. Currently we still collect these iterators into vectors,
  but in the future we could try pass them around. `Iterator.map`
  being lazy, this could result in decent performance gains.

Using iterators everywhere would still require a lot of work:
currently, mask objects are vectors for instance. This commit is just
a first step in that direction.
@little-dude little-dude merged commit b515515 into xaynetwork:pet May 25, 2020
@little-dude little-dude deleted the refactor-message branch May 25, 2020 12:42
little-dude added a commit that referenced this pull request May 25, 2020
This was removed by mistake in
#398
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants