-
Notifications
You must be signed in to change notification settings - Fork 408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beats input support #551
base: main
Are you sure you want to change the base?
Beats input support #551
Conversation
# Conflicts: # README.md # src/main/java/net/logstash/logback/encoder/CompositeJsonEncoder.java
Thanks for the contribution! Love the idea. Allow me some time to investigate it. I want to understand the lumberjack protocol a bit, and do some testing myself before merging it. |
Ok, no problem, thanks for replying! Just keep in mind that I kept my implementation simple and it does not account windows and other advanced Lumberjack stuff, but is sufficient to send logs as JSON to Graylog. Tested on Graylog 3.3 to be working. I also compared with the raw data from Filebeat’s output and it seems that this implementation is sufficient |
A few comments already... Lumberjack is not only about the data format, it is also a protocol with window and ACK (like TCP). The ultimate objective is to provide a reliable transport mechanism on top of TCP by which the sender is notified when the receiver has effectively secured/processed the data. Changes are therefore also required to the |
Ok, thanks for the feedback, I will fix it in my free time as soon I find some. @brenuart, Do you have any other comments or suggestions about the rest of the code or the code style? |
Converted to draft as it will need some rework to fully support Lumberjack |
Is this for lumberjack protocol v1 or v2 ? Whichever version is used needs to be explicitly stated in documentation, and potentially in the classnames (e.g. LumberjackV2...) (edit: I now see this is for v2 by looking at the constants. Can you make this more clear in the documentation and class naming?) Is there a specification for the protocol anywhere? If not, what is the best reference implementation for reverse engineering? What did you use as a guide when implementing this? Regarding payload converters... I think the current design is "inside-out". Meaning... I conceptually think of the lumberjack encoding wrapping the JSON inside of it. However, the current design has the lumberjack converter inside the json encoder, which seems "inside-out" to me. I think the encoding required by the lumberjack protocol would be better modeled as a separate encoder that delegates to the json encoder. For example, something like this: <encoder class="net.logstash.logback.encoder.LumberjackEncoder">
<encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder"/>
</encoder> Additionally, I don't think the payload converter pattern is something that is needed in general, since having a wrapping encoder is the preferred pattern for any use case that requires converting the json output. Another alternative would be to perform the encoding required by the lumberjack protocol inside of the appender itself, without exposing/requiring a separate encoder. Is there ever a need to do lumberjack encoding in something other than TCP? Also, the beats appender and the current lumberjack converter seem very tightly coupled in the current implementation (as evidenced by the installConverter method and the BeatsEncoderWrapper). This is a good argument for performing the lumberjack encoding within the appender itself, since it is required when using the beats appender. |
Thanks for your input! I've updated comments and README to indicate how I implemented this protocol and what is the version of this protocol. I also removed the PayloadConverter interface - yes, you are right, it is rendundant and I after reading the code again I felt like it was unnatural reconfiguring the encoder in the appender like that. I've tested the changes on Graylog 3.3 with Beats input and it seems to be working properly (sent about 100 messages with window size of 10). |
Nice! The removal of the PayloadConverter looks much cleaner and self-contained. Thanks for that update. I have one more major concern to discuss. This implementation claims to implement the full lumberjack v2 protocol, which "aims to provide reliable, application-level, message transport.". However, I don't believe the current implementation fully implements the "reliable" portion of it. Specifically, the sequence numbers of the acks are currently read from the stream, but then ignored. This means that if the appender needs to reconnect for whatever reason, and acks have not been received, those un-acked events could be lost. I believe the appender should resend un-acked events on reconnect to meet the reliability requirement of the lumberjack protocol. I haven't fully thought through how that would be implemented. Can you take a deeper look at what it would take to properly honor the sequence numbers in the acks, and provide full reliability? |
The reliable part of the Lumberjack protocol is indeed currently not implemented. Events can be lost when the connection is unexpectedly lost with the server but can also happen when the appender decides to cleanly shutdown the connection and switch back to another destination (see the various destination strategies). I already had a look at how this could be implemented a while ago and it seems that the RingBuffer can help us a lot... The idea here is to use a custom EventProcessor that would clear the events from the RingBuffer only after the corresponding ACKs are received. The processor should somehow remember the sequence (index in the RingBuffer) of the last sent event and the sequence of the last ACK event. Knowing these sequences, the processor can re-send un-acked events when appropriate by fetching them from the Just throwing ideas here and sketching a possible approach...
The PayloadConverter seems to be tight to the transmission protocol, isn't it? As far as I can tell this component seems to represent the frames of the protocol...
This protocol is implemented in the TcpSocketAppender. The framing imposed by LumberJack should therefore be kept "private" to the "LumberjackTcpAppender" - and not exposed as an Encoder or anything alike. Users should not even be aware of it... Another option is to introduce the concept of "wire protocol" (njson, lumberjack, etc) and keep a single TcpSocketAppender that would be configured with the desired protocol - this could make the design more flexible and even open the door for additional protocols in the future...
|
/** | ||
* A queue of currently accepted ACKs from remote | ||
*/ | ||
private final BlockingQueue<AckEvent> ackEvents = new ArrayBlockingQueue<>(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep a queue of the last 10 received ACKs or is it enough to keep the latest only?
AFAIK an ACK received for sequence x
acknowledge all events up to x
included.
public byte[] encode(E event) { | ||
if (counter.get() == windowSize) { | ||
try { | ||
ackEvents.take(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WriteTimeoutRunnable
may decide to close the socket whenever it detects a dead peer (that is a write operation taking too long). At this time the EventHandler thread may be waiting for the next ACKs before writing the current event. If the socket is closed, the ACK will never come and the thread will not exit from ackEvents.take()
(the AckReaderCallable will "die" because of the IOException received when the socket is closed).
Thanks for the suggestions, I would normally just use something like CircularFifoBuffer or something like this and implement it like this:
The disruptor's RingBuffer seems like a really interesting and promising idea, but I need to learn more about its usage to use it properly. I will also try to fix the other issues pointed out, but I have already removed Thanks for all feedback and suggestions btw. |
8e7c02d
to
926c65a
Compare
I had been trying to configure logstash-logback-encoder to send logs to Graylog, while reusing existing Beats input that is also used by Filebeat. Unfortunately I couldn't get it to work, as appenders and encoders output plain JSON. Graylog, on the other hand, refused to accept such input, throwing an exception that said "Unknown beats protocol version".
I have analyzed the problem and it seems that Graylog only accepts input in Lumberjack protocol format. I have decided to modify
CompositeJsonEncoder
so it can be extended with custom payload converters. Those converters can be used to convert whole encoder output to a given format (eg. Beats-compatible format or gzip if someone is willing to implement it).Those converters can be chosen in the configuration and are optional - if none are specified, everything will work as before.