Conversation
47fbd2c to
05586bf
Compare
05586bf to
94290bf
Compare
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
Sorry for the delay. I'm still planning on reviewing this. I'm behind. |
There was a problem hiding this comment.
I just learned I can remove these specializations with something similar to
template <class... Ts> struct tuple {};
template <class T, class... Ts>
struct tuple<T, Ts...> : tuple<Ts...> {
tuple(T t, Ts... ts) : tuple<Ts...>(ts...), tail(t) {}
T tail;
};
not 100% sure though, as I might run across problems creating the result in get()
There was a problem hiding this comment.
In general we don't have this level of template programming in the code base, as it's incredibly hard to read. Can you try to figure out a way to do this without so many templates? If it's not obvious can you describe what you are trying to do?
There was a problem hiding this comment.
The most-offending CompositeBuffer deserializer template has been removed and replaced with explicit domain deserializers.
All other deserializers are subclasses of Deserializer<T> interface and deserialize into instance of T.
The encoding template functions have been documented - in general I wanted to have single method that could work as encode(string), encode(int8), encode(bool) vs having encodeString/encodeInt8/encodeBool as IMHO it makes it harder to make generic code
There was a problem hiding this comment.
Sounds good. Let's see how it looks on the next pass.
|
@adamkotwasinski sorry for the really long delay. I have time to review this now. This is too much code to review in a single PR. Can you potentially cut this down to whatever infrastructure is required for a few Kafka messages and an incomplete codec and associated tests? Then we can review just that to make sure we are on the same page and go from there? |
|
@mattklein123 Done! Left only one request type, but it's complex enough to see how the versioning / changing structure is handled. |
mattklein123
left a comment
There was a problem hiding this comment.
This is so exciting! I did a first very high level pass. I apologize in advance but in the beginning I'm going to be focusing on a bunch of readability things which will ultimately help me drill down and then do a better real code review, so please have patience while we get through this "annoying" stuff, I promise everything will come out better in the end. I left a bunch of high level readabiltiy comments but for this next round I would really like to focus on:
- Having doc comments in all header files
- Starting to split up some of the large files where applicable
- Figure out how to reduce some of the more extreme cases of template programming. In general we don't use templates that heavily in Envoy. Of course we do sometimes, but some of the cases here, though impressive, had my eyes bleeding. 😉 (If it turns out that it's really impossible to do without hard core templates please make sure the code is extremely well commented and we can discuss).
Anyway, thanks for working on this and I promise I will be much more responsive with reviews from here on out.
There was a problem hiding this comment.
Please revert this in this PR and also remove any config library from this PR
There was a problem hiding this comment.
Can we strip this down in this PR to just the request types we are implementing?
There was a problem hiding this comment.
but then I cannot use RequestType::OffsetFetch as INT16 in my OffsetFetchRequest constructors .... or am I missing something?
There was a problem hiding this comment.
Can't you just use int16_t? All I'm saying is to use the system types which are really clear? I don't see the point in the typedef when we use these types all over the code base?
There was a problem hiding this comment.
nit: Can we remove comments like these, we don't so this elsewhere. The names of functions, namespace, etc. should generally be self describing. Same elsewhere.
There was a problem hiding this comment.
Do these typedefs provide any value? Can we just kill them all and use the direct types? For some of the ones down at the bottom if you want those to safe typing, use CamelCase such as NullableBytes
There was a problem hiding this comment.
Sure!
My rationale was that I wanted to use "kafka types" such as INT8 when I use things coming from Kafka itself. e.g. request's api_version is INT16, but I could have made an explicit typedef uint16_t api_version too
There was a problem hiding this comment.
I would just avoid typedefs for the integer cases. int8_t is pretty specific for example.
There was a problem hiding this comment.
In general we don't have this level of template programming in the code base, as it's incredibly hard to read. Can you try to figure out a way to do this without so many templates? If it's not obvious can you describe what you are trying to do?
There was a problem hiding this comment.
Can we move the buffer to avoid the copy? More generally, would it be faster to precompute the size before encoding so we can encode directly into the output?
There was a problem hiding this comment.
Will do.
I was considering that, but then it needs a method for each request type (& substructure) that's trivially returning something along the lines of
size_t FetchRequest::size() {
return sizeof(field1) + sizeof(field2) + substructure.size()
}
There was a problem hiding this comment.
I would like to defer this problem until we decide how actually we are going to serialize/deserialize the data. First let's solve how are we going to manage encode / decode operations, computeSize will be yet another that will follow precisely the same pattern (after all our requests/responses are trivial trees - so let's decide on tooling to traverse the trees first).
Makes sense?
There was a problem hiding this comment.
Yup that's fine, I would just put a TODO comment to remind us of a potential optimization later.
There was a problem hiding this comment.
Can you merge master? I think you need to use the new VLA macros that were added recently.
There was a problem hiding this comment.
In general, I think this code would benefit from starting to split it into multiple files. Can we potentially have a file for base objects/classes, and then a file per message type since they seem to be fairly large? I think that would make the code easier to read and review as it grows.
There was a problem hiding this comment.
naming them request_type.h and putting them in messages subdirectory
I envision one header will contain both request & response for the same type
There was a problem hiding this comment.
I think it starts to be pretty obvious here that OffsetCommitPartitionV0Buffer, OffsetCommitPartitionV1Buffer, OffsetCommitTopicV0Buffer, OffsetCommitTopicV1Buffer & co. are very similar to each other, being composed of 2 or 3 or 4 sub-deserializers, that then take the deserialization result, and just bundle them up into resulting T from Deserializer<T> via { .... } constructor
I think we can attack it in a few different ways:
a) keep doing the same stuff, and just cope with the code bloat
b) introduce a new class CompositeDeserializer that would take a vector (tuple?) of sub-deserializers, something like
class CompositeDeserializer : public Deserializer<T> {
private:
vector<Deserializer<?>> delegates_; // I would be storing Deserializer<int8> next to Deserializer<string> here, so I might need a pointer I guess
public:
bool ready() { return delegates_.end()->ready() } // same thing, ready when last deserializer is full
size_t feed(const char*& buffer, uint64_t& remaining) {
for (Deserializer &d : delegates) {
result += d.feed(data, remaining)
}
return result;
}
T get() const {
vector<void*> results = // get result from each of delegates and put them in vector
return T{results} // needs a constructor that would downcast each of elements from void* to proper type
}
}
What I don't like here I have void* flowing around requiring downcasting and then depending on constructor to fill in request's (or other subelement's) fields properly.
c)
Common super classes looking like "deserializer composed of 2 deserializers"
class CompositeDeserializerComposedOf2Deserializers<T, Delegate1T, Delegate2T> {
private:
Delegate1T d1;
Delegate2T d2;
public:
bool ready() { return d2.ready() }
size_t feed (....) {
d1.feed(....)
d2.feed(....)
}
T get() {
return { d1.get(), d2.get() };
}
}
drawbacks : it's the template solution back again ;)
d) bonus consideration: make OffsetCommitRequest abstract and add OffsetCommitRequestV0, OffsetCommitRequestV1 concrete classes - it could alleviate some of problems present in b) - as the constructor would always know that field 0 is e.g. topic name; instead of needing to check with api_version first
There was a problem hiding this comment.
Hmm yeah, I think I'm starting to see the issue here, agree that b) is not great. I'm not completely opposed to templates, if you think it's the best way. I would pick what you think is best for now (with an eye towards trying to avoid complex templates if possible) and just make sure all of the header files are really well commented. Then I can take another pass? Sound good?
There was a problem hiding this comment.
Sure! Thank you - I think it's visible the most when we actually have N semi-similar classes, then the benefit of (templated) superclass becomes appearent.
There was a problem hiding this comment.
The composite deserializers have been added in serialization_composite.h and are tested in serialization_test.cc
Replaced the offsetcommit deserializers & request header deserializer (also composed of 4 delegates).
There was a problem hiding this comment.
one can write:
class OffsetCommitRequestV0Deserializer:
public CompositeDeserializerWith2Delegates<
OffsetCommitRequest,
StringDeserializer,
ArrayDeserializer<
OffsetCommitTopic,
CompositeDeserializerWith2Delegates<
OffsetCommitTopic,
StringDeserializer,
ArrayDeserializer<
OffsetCommitPartition,
CompositeDeserializerWith3Delegates<
OffsetCommitPartition,
Int32Deserializer,
Int64Deserializer,
NullableStringDeserializer
>>>>> {}
but I think it makes it even harder to understand?
There was a problem hiding this comment.
Yes that is not good. At this point, please just make the code is readable as you think it can be, while trying to remove egregious templates. When this is done I will take another pass and offer different suggestions if I see them. It will be much easier to do this with all the other changes I have asked for such as comments, etc.
There was a problem hiding this comment.
The remaining big-template classes will be present only in messages/*.h files, as we need to define the structure somehow.
The end code is actually pretty similar to how Java client code looks like ( https://github.com/apache/kafka/blob/2.1/clients/src/main/java/org/apache/kafka/common/requests/OffsetCommitRequest.java#L58 ) but Java without templates forces them to use raw Objects flying around and (nicely hidden) downcasts just like here https://github.com/apache/kafka/blob/2.1/clients/src/main/java/org/apache/kafka/common/protocol/types/Type.java#L211
Other changes like comments, splits, renames etc. have been already applied.
There was a problem hiding this comment.
@adamkotwasinski OK so the PR is ready for another pass?
There was a problem hiding this comment.
Can you fix DCO before I start reviewing again?
|
🙀 Error while processing event: |
Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
- remove unnecessary request type constants - remove garbage comments - move operator<< helper to separate header - move requests (currently only offset_fetch) to separate header - remove unnecessary typedefs - add missing documentation - improve GeneratorMap Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
- properly access buffer slices - remove CompositeDeserializer and replace it with expanded classes - documentation Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
- some renames - more comments - missing include Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
8e81089 to
8cb67ee
Compare
mattklein123
left a comment
There was a problem hiding this comment.
Nice, flushing next round of comments. Will get into the serialization headers next.
/wait
| namespace NetworkFilters { | ||
| namespace Kafka { | ||
|
|
||
| // abstract codecs for requests and responses |
| * Kafka message decoder | ||
| * @tparam MT message type (Kafka request or Kafka response) | ||
| */ | ||
| template <typename MT> class MessageDecoder { |
There was a problem hiding this comment.
nit: s/MT/MessageType, same below.
| namespace NetworkFilters { | ||
| namespace Kafka { | ||
|
|
||
| // functions present in this header are used by request / response objects to print their fields |
There was a problem hiding this comment.
nit: del
Also, this seems general and not only for Kafka. Can we move this into https://github.com/envoyproxy/envoy/tree/master/test/test_common somewhere? Perhaps https://github.com/envoyproxy/envoy/blob/master/test/test_common/printers.h? Not sure of best place.
| * Kafka request type identifier (int16_t value present in header of every request) | ||
| * @see http://kafka.apache.org/protocol.html#protocol_api_keys | ||
| */ | ||
| enum RequestType : int16_t { |
There was a problem hiding this comment.
Can I keep it this way?
Making enum class out of request type will require adding another value for "unknown request" and possibly forces us to keep two fields in UnknownRequest - RequestType for unknown value and then int16_t with the real value that was received.
The other idea would be to make N constexprs like
constexpr int16_t PRODUCE_REQUEST_TYPE{0}
....
constexpr int16_t OFFSET_COMMIT_REQUEST_TYPE{8}
extemely related : https://stackoverflow.com/questions/1965249/how-to-write-a-java-enum-like-class-with-multiple-data-fields-in-c
| @@ -0,0 +1,25 @@ | |||
| #pragma once | |||
|
|
|||
| #include <vector> | |||
| namespace Kafka { | ||
|
|
||
| /** | ||
| * Represents a sequence of characters or null. For non-null strings, first the length N is given as |
There was a problem hiding this comment.
From a higher layer code perspective, does the user care about encoding the length inside this string? Don't they just want to operate on a string or nothing? What is the intention of these types? IMO the encoding is an implementation detail. What is actually contained in these types? Can we clarify? Same for below.
There was a problem hiding this comment.
good point!, I can remove that in this header, this stuff is going to be present in deserialization.h anyways (as there I want to state why I'm parsing the way I am)
| * @param remaining remaining data in buffer, will be updated by parser | ||
| * @return parse status - decision what should be done with current parser (keep/replace) | ||
| */ | ||
| virtual ParseResponse parse(const char*& buffer, uint64_t& remaining) PURE; |
There was a problem hiding this comment.
Small readability comment: I would have a struct ParseState or something like that, that contains the buffer pointer and remaining length, and then pass that by reference for modification. Alternatively and possible better, is it possible to just pass an absl::string_view and modify the string view (or return a new one)?
| * Consumes INT32 bytes as request length and updates the context with that value | ||
| * @return RequestHeaderParser instance to process request header | ||
| */ | ||
| ParseResponse parse(const char*& buffer, uint64_t& remaining); |
There was a problem hiding this comment.
For overridden functions:
- Use the
overridekeyword - Don't duplicate the doc comment from the interface header
- Precede the overrides of the comments with a comment similar to
// Extensions::Networkfilter::Kafka::Parser
Please audit for this elsewhere.
| * @param BT deserializer type corresponding to request class (should be subclass of | ||
| * Deserializer<RT>) | ||
| */ | ||
| template <typename RT, typename BT> class RequestParser : public Parser { |
There was a problem hiding this comment.
nit: Spell out RT, BT, etc. same elsewhere.
| public: | ||
| virtual ~RequestCallback() = default; | ||
|
|
||
| virtual void onMessage(MessageSharedPtr) PURE; |
| namespace NetworkFilters { | ||
| namespace Kafka { | ||
|
|
||
| /** |
There was a problem hiding this comment.
Why are these templates actually needed? AFAICT all they do is string together multiple feed calls. Can't feed be an interface method, and then there can be a class that takes a list/vector of feeders and generically does what all of this template code does?
There was a problem hiding this comment.
I think what you described is basically point b) of #4950 (comment)
The feed part is not a problem, but I'm getting problems with constructing results in get().
If I keep a vector of Deserializer-s then their return type is going to have to be something like void* or anything sufficiently generic, and then I'd need to convert these results into Request's constructor arguments.
Basically would want to have something like
std::vector<Deserializer<?>> delegates_;
ReturnType get() const {
return { delegates_[0].get(), delegates_[1].get(), delegates_[2].get() .... };
}
As a minimum feed will be changed to be more generic.
When it comes to get, I will take a look into possibility of having an array instead of vector - this way I think I'll be able to do some templating to change array<N>{ deserializer1, ....} into argument list.
There was a problem hiding this comment.
OK I see the problem. Alright do what you can and I will take another pass in the next round. Sorry for not seeing the get() issue.
|
/retest |
|
🔨 rebuilding |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, is this ready to go other than figuring out why coverage is failing? Maybe try merging master? If that doesn't work how can we help debug?
/wait
| // TODO(adamkotwasinski) discuss capturing the data as-is, and simply putting it back | ||
| // this would add ability to forward unknown types of requests in cluster-proxy | ||
| /** | ||
| * It is impossible to encode unknown request, as it is only a placeholder. |
There was a problem hiding this comment.
So this can never happen in practice? If so should it be NOT_IMPLEMENTED?
|
/wait |
|
🔨 rebuilding |
…et picked up by gcovr Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
|
the coverage should be fixed now but I still need to address the 100% coverage & other comments |
Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
- Put request parse failures in separate objects; - Simplify message hierarchy; - Remove message.h and make Encoder/Parser/ParseResponse templated to support Response objects in future Signed-off-by: Adam Kotwasinski <adam.kotwasinski@gmail.com>
|
/wait |
|
/retest |
|
🔨 rebuilding |
|
@mattklein123 |
mattklein123
left a comment
There was a problem hiding this comment.
Thank you so much @adamkotwasinski for powering through this over the last many months. I think where we ended up here is awesome and will be very sustainable as Kafka adds new message types. At this point I think we should ship and iterate. Super exciting!
Description: Placeholder for Kafka codec, API inspired on some level by Mongo codec (internals partially by Redis though)
Risk Level: medium
Testing: so far manual, by setting up Kafka cluster & Kafka clients (I keep filter code that's relatively trivial and not part of this PR)
Recommended (?) order of reading:
codec.h- abstract codecs (there will be one for kafka request & response - to be used inonData/onWritefilter methods)request_codec.h / .cc- kafka request encoder&decoder for requests - decoder notifies listeners (similar to mongo) and uses a parser-loop to handle complex messagesparser.h- parser is a parser (the name isn't perfect) - it's a stateful thingy that keeps current buffering state, and can tell if it's done (returns a message) or should be changed (nextParser)kafka_types.h- kafka data typeskafka_protocol.h- constants coming from kafka (request-only now)serialization.h- howchar*gets translated into kafka types + complex buffer template (that combines N buffers, and then combines their result into one return object) and encoderkafka_request.h / .cc:** request header (4 fields present in every msg)
** request context keeping that header that's passed in the parser chain mentioned above
** request mapper (basically a mapper from request_type x request_version -> parser)
** message start parser (consumes message length)
** header parser (consumes request header)
** some request parsers (keys 0-9 with versions up to Kafka 0.11)
** (at the bottom)
UnknownRequestfor requests that couldn't be recognised by mapperTests:
serialization_test- trivial kafka types, checked in both ways: feed whole buffer at once & trickle by one byte (to ensure state is being kept by buffer properly - compare with redis filter that appears to be doing the same)kafka_request_test- mapper testing; the details of parse loopstart parser->header parser->specific parser;request_codec_test- full tests for each of currently implemented request type/version combos