-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate from protoc to quick-protobuf #3066
Conversation
@@ -174,42 +173,37 @@ impl Keypair { | |||
} | |||
}; | |||
|
|||
Ok(pk.encode_to_vec()) | |||
Ok(quick_protobuf::serialize_into_vec(&pk).expect("Encoding to succeed.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is infallible by looking at the code. Unfortunately, I have not found a concrete answer in the docs or issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so too. One error I could find is https://docs.rs/quick-protobuf/latest/src/quick_protobuf/writer.rs.html#379 which can't happen if the Vec
is allocated correctly.
The other error path comes from lines like https://github.com/tafia/quick-protobuf/blob/d977371e05170a016f03a80512b2a925468e3a1a/quick-protobuf/examples/pb_rs/data_types.rs#L199 which end up being the same error case as above so I think we are good.
I believe this function could actually be made infallible in quick-protobuf
but I am not sure.
If we want to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making this patch!
Is it likely that we will need any of those unsupported features?
I don't think so. We only use protobufs to have a language-agnostic message description language so we are unlikely to use services for example.
I am not quite sure what option
refers too? Is that talking about optional
fields from proto3
?
Any other concerns?
- The verbosity of the build script is annoying but that shouldn't block a transition.
- This is a bit concerning: Fix owned deref tafia/quick-protobuf#153. I am not liking the fact that there is
unsafe
in (some of) the generated code.
Overall, I like that we can use Cow
in a few places now and overall, interacting with the generated code seems a lot simpler!
It also seems that @snproj has recently been active on the repository as a maintainer? I hope you don't mind being tagged! :)
I'd like to hear @mxinden's opinion before we move forward here. Personally I am in favor assuming the above unsafe
problem is fixed.
core/build.rs
Outdated
let out_dir = std::env::var("OUT_DIR").unwrap(); | ||
let out_dir = Path::new(&out_dir).join("protos"); | ||
|
||
let in_dir = PathBuf::from(::std::env::var("CARGO_MANIFEST_DIR").unwrap()).join("src"); | ||
|
||
// Find all *.proto files in the `in_dir` and add them to the list of files | ||
let mut protos = Vec::new(); | ||
let proto_ext = Some(Path::new("proto").as_os_str()); | ||
|
||
for entry in std::fs::read_dir(&in_dir).unwrap() { | ||
let path = entry.unwrap().path(); | ||
if path.extension() == proto_ext { | ||
protos.push(path); | ||
} | ||
} | ||
|
||
// Delete all old generated files before re-generating new ones | ||
if out_dir.exists() { | ||
std::fs::remove_dir_all(&out_dir).unwrap(); | ||
} | ||
|
||
std::fs::DirBuilder::new().create(&out_dir).unwrap(); | ||
let config_builder = ConfigBuilder::new(&protos, None, Some(&out_dir), &[in_dir]).unwrap(); | ||
FileDescriptor::run(&config_builder.build()).unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of annoying boilerplate but I could live with that. Worst case we could always make a little crate that abstracts things away for us.
core/src/identity.rs
Outdated
let mut private_key: zeroize::Zeroizing<keys_proto::PrivateKey> = | ||
quick_protobuf::deserialize_from_slice(bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd prefer using turbo-fish here.
let mut private_key: zeroize::Zeroizing<keys_proto::PrivateKey> = | |
quick_protobuf::deserialize_from_slice(bytes) | |
let mut private_key = | |
quick_protobuf::deserialize_from_slice::<keys_proto::PrivateKey>(bytes) |
r#type: keys_proto::KeyType::Ed25519 as i32, | ||
data: key.encode().to_vec(), | ||
Type: keys_proto::KeyType::Ed25519, | ||
Data: Cow::from(key.encode().to_vec()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the usage of Cow
here would actually allow us to remove an allocation. A Ed25519
key in particular can also be viewed as a byte-slice so we could use Cow::Borrowed
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seem like encode()
returns an array which creates a temporary that goes out of scope so we can't remove the allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to add pub(crate) fn as_bytes(&self) -> &[u8]
to ed25519::PublicKey
which then allows the use of Cow::Borrowed
.
core/src/signed_envelope.rs
Outdated
payload_type: Cow::from(self.payload_type), | ||
payload: Cow::from(self.payload), | ||
signature: Cow::from(self.signature), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could allow be borrowed :)
core/src/peer_record.rs
Outdated
.map(|m| peer_record_proto::peer_record::AddressInfo { | ||
multiaddr: m.to_vec(), | ||
.map(|m| peer_record_proto::mod_PeerRecord::AddressInfo { | ||
multiaddr: Cow::from(m.to_vec()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one could be borrowed too I think.
Looks like there was some good activity and this PR and a follow up got merged and released in the last days |
Hi! Didn't see this earlier 😅 Just popping in to say yep, my org intends to maintain Since you mentioned the generated API, could I
I think it refers to this, which is currently being ignored by |
I'll update this PR very soon. |
24d5a60
to
ecfa80e
Compare
This test is failing because Does anyone know what the issue might be? Will investigate later today or tomorrow....
|
|
#[allow(clippy::derive_partial_eq_without_eq)] | ||
#[derive(Debug, Default, PartialEq, Clone)] | ||
pub struct Envelope<'a> { | ||
pub public_key: Option<keys_proto::PublicKey<'a>>, | ||
pub payload_type: Cow<'a, [u8]>, | ||
pub payload: Cow<'a, [u8]>, | ||
pub signature: Cow<'a, [u8]>, | ||
} | ||
|
||
impl<'a> MessageRead<'a> for Envelope<'a> { | ||
fn from_reader(r: &mut BytesReader, bytes: &'a [u8]) -> Result<Self> { | ||
let mut msg = Self::default(); | ||
while !r.is_eof() { | ||
match r.next_tag(bytes) { | ||
Ok(10) => msg.public_key = Some(r.read_message::<keys_proto::PublicKey>(bytes)?), | ||
Ok(18) => msg.payload_type = r.read_bytes(bytes).map(Cow::Borrowed)?, | ||
Ok(26) => msg.payload = r.read_bytes(bytes).map(Cow::Borrowed)?, | ||
Ok(42) => msg.signature = r.read_bytes(bytes).map(Cow::Borrowed)?, | ||
Ok(t) => { r.read_unknown(bytes, t)?; } | ||
Err(e) => return Err(e), | ||
} | ||
} | ||
Ok(msg) | ||
} | ||
} | ||
|
||
impl<'a> MessageWrite for Envelope<'a> { | ||
fn get_size(&self) -> usize { | ||
0 | ||
+ self.public_key.as_ref().map_or(0, |m| 1 + sizeof_len((m).get_size())) | ||
+ if self.payload_type == Cow::Borrowed(b"") { 0 } else { 1 + sizeof_len((&self.payload_type).len()) } | ||
+ if self.payload == Cow::Borrowed(b"") { 0 } else { 1 + sizeof_len((&self.payload).len()) } | ||
+ if self.signature == Cow::Borrowed(b"") { 0 } else { 1 + sizeof_len((&self.signature).len()) } | ||
} | ||
|
||
fn write_message<W: WriterBackend>(&self, w: &mut Writer<W>) -> Result<()> { | ||
if let Some(ref s) = self.public_key { w.write_with_tag(10, |w| w.write_message(s))?; } | ||
if self.payload_type != Cow::Borrowed(b"") { w.write_with_tag(18, |w| w.write_bytes(&**&self.payload_type))?; } | ||
if self.payload != Cow::Borrowed(b"") { w.write_with_tag(26, |w| w.write_bytes(&**&self.payload))?; } | ||
if self.signature != Cow::Borrowed(b"") { w.write_with_tag(42, |w| w.write_bytes(&**&self.signature))?; } | ||
Ok(()) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow this is actually so decent to look at!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha yup
use pb_rs::{types::FileDescriptor, ConfigBuilder}; | ||
use std::path::{Path, PathBuf}; | ||
|
||
fn main() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need a CI check that ensures that the generated code is not modified!
Ideally, the CI check can use glob patterns so we auto-detect when new .proto
files are added to the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too familiar with Github actions so I have some questions. Do we want the build to fail when these files are modified? Is there a way to turn these off when we legitimately need to update these files? Also, I think when someone adds a new generated file and .proto file, the build should not fail?
Another option is to make a warning comment in the PR when proto/* files are added or modified. Another simpler approach which I've seen some projects do (mitmproxy for example) is to add these proto/
directories to .gitignore
.
Thoughts?
Possible solutions:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should never modify these files. I was thinking something along the lines of:
- Run protobuf compiler with some glob pattern capturing all
.proto
files - Run
git diff --exit-code
If anyone modified a generated file, git diff
will fail the workflow.
This way, CI will only pass if there is a generated code file for each .proto
file that is completely untouched, resulting in the exact same behaviour as if we were to generate it at compile-time.
Well that is a bit of a weird one. Does the workaround mentioned in the issue solve things or are we blocked here? |
Not a blocker. We're good as long as we only use |
This pull request has merge conflicts. Could you please resolve them @kckeiks? 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know when this ready for another full review!
core/CHANGELOG.md
Outdated
@@ -1,3 +1,9 @@ | |||
# 0.39.0 [unreleased] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# 0.39.0 [unreleased] | |
# 0.38.1 [unreleased] |
I'd hope that we can ship this as a backwards-compatible change.
transports/noise/src/lib.rs
Outdated
@@ -57,12 +57,15 @@ | |||
|
|||
mod io; | |||
mod protocol; | |||
#[allow(clippy::derive_partial_eq_without_eq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also applied within the file, isn't it?
@@ -1,3 +1,9 @@ | |||
# 0.42.0 [unreleased] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# 0.42.0 [unreleased] | |
# 0.41.1 [unreleased] |
Same here, it should be a backwards-compatible change.
core/Cargo.toml
Outdated
@@ -54,7 +54,7 @@ rmp-serde = "1.0" | |||
serde_json = "1.0" | |||
|
|||
[build-dependencies] | |||
prost-build = "0.11" | |||
pb-rs = "0.10.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should no longer need this, right?
512dc87
to
630c76f
Compare
fc28b0a
to
0a0c42a
Compare
Please merge master instead of force-pushing. We squash merge anyway and it makes reviews easier :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this is definitely heading in the right direction! Thank you!
Things to look out for:
- Before we convert more crates, I'd like to verify that we can reasonably easy write a CI script that checks the generated code for modifications. I've just tried it locally and
pb-rs **/*.proto
seems to do the trick! I'd like to see this rather sooner than later in the progress of this PR. I think a good next step would be to move all proto files in the directories we want, generated the according Rust code and add the CI job that verifies it all. From there, we can continue to convert more crates to actually use the newly generated code. - We currently rely on
prost-codec
in various crates. Now that we are no longer usingprost
, that will have to renamed (and re-released) toquick-protobuf-codec
. I'd like to verify that we can actually build the same abstraction, i.e. an implementation ofasynchronous-codec::Encoder
andasynchronous-codec::Decoder
for structs generated usingquick-protobuf
. - In addition to the CI script, we can remove the installation of
protoc
from all workflows.
Thanks again for tackling this, I am very excited to land this improvement. It is a big plus for our CI pipeline and removes an annoying step for people to get starting hacking on rust-libp2p
.
r#type: keys_proto::KeyType::Ed25519 as i32, | ||
data: key.encode().to_vec(), | ||
Type: keys_proto::KeyType::Ed25519, | ||
Data: Cow::from(key.encode().to_vec()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to add pub(crate) fn as_bytes(&self) -> &[u8]
to ed25519::PublicKey
which then allows the use of Cow::Borrowed
.
let base_64_encoded = "CAESQL6vdKQuznQosTrW7FWI9At+XX7EBf0BnZLhb6w+N+XSQSdfInl6c7U4NuxXJlhKcRBlBw9d0tj2dfBIVf6mcPA="; | ||
let base_64_encoded = "RAgBEkC+r3SkLs50KLE61uxViPQLfl1+xAX9AZ2S4W+sPjfl0kEnXyJ5enO1ODbsVyZYSnEQZQcPXdLY9nXwSFX+pnDw"; | ||
let expected_peer_id = | ||
PeerId::from_str("12D3KooWEChVMMMzV8acJ53mJHrw1pQ27UAGkCxWXLJutbeUMvVu").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this test change? We should not need to change this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes because of this #3066 (comment). I've been thinking about this and feel uneasy about it. Should we instead use this pattern tafia/quick-protobuf#202 (comment)? That way we're consistent with the official encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead use this pattern tafia/quick-protobuf#202 (comment)? That way we're consistent with the official encoding.
I think so yes. We need to be compatible here so this test should not need changing!
pub(crate) fn unknown_key_type(key_type: i32) -> Self { | ||
Self { | ||
msg: format!("unknown key-type {key_type}"), | ||
source: None, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
@@ -18,12 +18,12 @@ | |||
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER | |||
// DEALINGS IN THE SOFTWARE. | |||
|
|||
use crate::structs_proto; | |||
use crate::protos::structs as structs_proto; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a fan of renaming modules on import. It would be cool if we could just access all structs under a proto
namespace.
To achieve, would it make sense if we:
- Rename the
protos
module togenerated
- Create a
proto
module inlib.rs
asmod proto {}
- Re-export all types that we need from there to flatten out the module hierarchy.
This pull request has merge conflicts. Could you please resolve them @kckeiks? 🙏 |
I will have some bandwidth at the end of this week to work on this. |
As a heads-up: Response times on our end might be slow over the next 2-3 weeks. Personally, I'll be pretty much off the grid for about two weeks from next Monday. |
Moving to #3312 |
@mxinden I've been thinking. Instead of releasing another crate, would you accept a PR to https://github.com/mxinden/asynchronous-codec with a feature-flag for decoding protobuf messages via |
This pull request has merge conflicts. Could you please resolve them @kckeiks? 🙏 |
Sure thing! Happy to merge a patch into |
I see your concern but given that protobuf serializations themselves aren't self-descriptive in that way, some strategy is needed. I think exposing a codec that always uses varint for that is a sensible choice. We can try and generalize that once someone comes along with a different usecase. @kckeiks Would you be up for extending |
Why not :) |
@thomaseizinger @mxinden I'm running into an issue over in mxinden/asynchronous-codec. |
Ah damn. I see three solutions:
Looking at it from first principles, (3) is the cleanest but requires us to have another crate again. Really,
We can't do that because it would not be compatible on the network layer with other peers / implementations. |
Sounds good. I prefer (3) as well. |
Closing here in favor of #3312. |
Description
According to perftest,
quick-proto
is faster thanprotobuf
. Since Protobuf is fundamental tolibp2p
, the performance of the library we choose is important. Also, thequick-protobuf
generated API is much simpler thanprotobuf
's.Open Questions
Pros and cons of
quick-protobuf
.List of unsupported features.
Is it likely that we will need any of those unsupported features?
Any other concerns?
Change checklist