Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference with pinterest/elixir-thrift #55

Open
samuelmanzanera opened this issue Aug 1, 2019 · 8 comments
Open

Difference with pinterest/elixir-thrift #55

samuelmanzanera opened this issue Aug 1, 2019 · 8 comments

Comments

@samuelmanzanera
Copy link

samuelmanzanera commented Aug 1, 2019

Hello :)
Really great work for the Thrift implementation in Elixir.
But I'm stuck to know which repo and implementation is better or the most up to date between
this one and the pinterest/elixir-thrift.

Why 2 repositories for the same implementation ? Because this one as the advantage to prevent code generation.

Thanks

@scohen
Copy link
Collaborator

scohen commented Aug 1, 2019

Use elixir-thrift, it’s a pure Elixir library that’s both easier to use and much faster.

This library was first and just wraps the erlang implementation, which was a choice made at the time to get an Elixir-friendly library up and running more quickly.

Make no mistake, this library also does code generation, but via macros, which incurs a lot of compile time cost.

@samuelmanzanera
Copy link
Author

Ok.
Thank you very much @scohen

Just a dummy question, why Thrift implementations do code generations ?

@scohen
Copy link
Collaborator

scohen commented Aug 1, 2019

That's not a dumb question, it's a very good question.

Thrift starts out with an interface definition language or IDL that specifies data structures and functions that make up an RPC. Those data structures don't actually exist outside of the Thrift IDL that you write, so they need to be generated in order for you to work with them. So, generating the structs and the modules that hold the RPC functions is one part of code generation that, I believe, is required by all languages that have Thrift implementations.

But elixir-thrift goes much further than that with its code generation, going so far as to generate a protocol that's specific to the IDL you wrote. That's because the erlang VM has something called the bin_opt_info binary optimization (detailed here) that is impossible to write by hand for every permutation of IDL out there. For this reason, the protocol code is generated for you. The optimization makes a significant difference to the decoding time of thrift structs.

Another advantage of generating all these files and having them live in your project is that the generated files are only recompiled by mix when they change, so you can change the IDL and not suffer a recompile of all of your thrift code. Riffed on the other hand doesn't produce artifacts like this, and must recompile all thrift modules whenever one of them changes. Pinterest has a pretty large thrift repository, and the compilation of all thrift modules wasted a lot of developer time.

@samuelmanzanera
Copy link
Author

Thank you very much for this deep explanation.

I was just wondering if IDL definition could be used just as reference to decode data from a server and relay to a specific handler (like RPC method decoding only - like authorized methods, payload...) and for the client to specify the method to reach on the server, to avoid code generation.
Because on the project that I'm working on, will require hot-reload changes over the times and probably the IDL and the implementation..So I'm searching solutions to minimize the cost of changes. (I'm also benchmarking and try out other solutions such as Ranch + ETF, BertRPC, Ranch + FlatBuffers, Ranch + Avro)

But I get the point than this generated code helps for the Erlang optimization and may avoid time-consumption, complexity and complete reloading of modules.

@scohen
Copy link
Collaborator

scohen commented Aug 2, 2019

will require hot-reload changes over the times and probably the IDL and the implementation

This will could easily break if your datastructures change on either end. Hot code reloading is quite fragile, and can only be done safely in context of a release, in which case, you're shipping new code anyways. That said, you can take steps to make structs and messages backwards compatible in Thrift.

Formats like ETF and JSON allow anything to be sent over the wire. That's a blessing and a curse, you are totally free to encode any data you wish, but it's up to you to parse that data in a meaningful fashion.

Avro gets around this by putting the schema inside the message, which might be a good compromise for you (if there's a good Avro implementation for Elixir). I don't think Elixir would benefit from the approach taken by flatbuffers, since it doesn't operate like C does where structs can be packed directly in memory (I might be wrong here, my knowledge of FlatBuffers is pretty limited).

If you seek performance and can get away with it, ETF + Ranch should be very fast and easy to work with, provided you define your messages well.

@samuelmanzanera
Copy link
Author

Thank you for your reply.
Indeed, the hot code reload need to be taken with precaution. In the app I am building, it's involve a lot of decentralization and also code decentralization, so hot code reload important but must be used "release" or "approval changes"...

For my project, I need to rely on binary protocols so for this I studied Thrift, ETF, FlatBuffers, Protobuf, Avro, ....First I wanted a schema based approach to avoid untrusted message, but I'm wondering if this is a good approach in term of self-evolving data structures management in a decentralized manner.

About Flatbuffers, indeed on the Elixir implementation (https://github.com/wooga/eflatbuffers), it's not like in C/C++, because I get about the same performance to as Avro. So not really interesting. And also Avro data are smaller. Unfortunately, there is not yet an RPC implementation in Erlang/Elixir. So I based my assumption on Ranch + Avro encoding.
(Erlang implementation: https://github.com/klarna/erlavro)

About the performances, effectively, Thrift or Ranch+ETF (or BertRPC) are very similar but are better than Ranch + Avro

Thanks again @scohen

@samuelmanzanera
Copy link
Author

@scohen for you, what will be the steps to make structs and messages backwards compatible in Thrift ?

@scohen
Copy link
Collaborator

scohen commented Aug 5, 2019

but must be used "release" or "approval changes"...

So, to be perfectly clear, when I'm talking about a "release", I'm speaking of a specific erlang concept. There's also elixir versions of this, like distillery and now, native elixir releases.

So, in the end, our Thrift server is just a wrapper around ranch, which is generally excellent. You'll find that it has more than adequate performance and won't be a bottleneck for any of your needs. You'll also find that nothing will beat ETF with respect to speed of decoding and encoding; it's fundamental to how erlang distribution works and is heavily optimized.

If you want to make your thrift structs backwards compatible:

  1. Never mark a field as required
  2. Number your fields

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants