-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for AMQP (RabbitMQ preferred) as transport #190
Comments
+1 ! Same use case here ! |
Why not lumberjack -> logstash (collector) -> AMQP -> logstash (processing) Understood that it's another application to run. Pete BowdenOn Mon, Apr 7, 2014 at 4:51 AM, graph1zzlle [email protected]:
|
That would introduce a useless logstash collector layer, and with the load might need multiple collector instance, load balanced as well, so... |
Hi guys, |
It is unlikely logstash-forwarder will support AMQP. If you need AMQP support, logstash can do this, and as suggested above, I also recommend logstash-forwarder -> logstash -> amqp This project aims to solve a small specific problem and adding AMQP support is against that goal, in a way. Logstash itself supports many protocols, AMQP included, so I recommend you use that instead if you need to use AMQP :) |
@jordansissel can you explain the 'small specific problem' with which AMQP support conflicts? Based on the documentation in the Readme, the output channel is only discussed with regard to the requirements for adding a new protocol and AMQP meets these goals. |
Those quotes are from the readme:
My setup would benefit greatly from being able to run something with a lower RAM footprint than logstash itself. That's what I want to use logstash-forwarder for.
RabbitMQ supports ssl connections. http://www.rabbitmq.com/ssl.html Creating another server, or instance of logstash, just to stage logs from logstash-forwarder to rabbitmq feels kind of messy. It introduces another place where things could break down. logstashforwarder -> logstash stager -> rabbitmq -> logstash indexer -> elasticsearch is a pretty long chain... Anyway, that's why I'd like to see logstash-forwarder be able to send to rabbitmq. :) |
@joerocklin I lost interest in AMQP around 2-3 years ago when the AMQP ecosystem fractured into brokers that supported 0.8, 0.9, 0.9.1, and 1.0. I don't know the current state of things, but I can confirm that Logstash renamed the "amqp" plugin to "rabbitmq" simply because through that fracture, and perhaps by accident, the only known-supported broker for the logstash amqp plugin was RabbitMQ - nothing else really worked due to protocol deviations. So from my experience, "AMQP" is this nebulous cloud of things that probably actually speak different protocols despite claims of using whatever they are calling "AMQP." If we focus specifically on RabbitMQ, what is the benefit in doing putting a broker in between lsf and logstash, where today the protocol used does not require a broker? Operationally, it has been wonderful for users that a broker has not been required between lsf and logstash.
RabbitMQ is a message passing system, so the end goal of moving your logs is never going to be "store them in rabbitmq" because rabbit is a transit system, not a storage system. Where do they go after that? I want to keep logstash-forwarder easy to maintain and support, and it's not clear to me how adding RabbitMQ support would make it easier to maintain (more code) and easier to support (more complexity in setup, RabbitMQ is not well understood by many who use it based on my experiences). |
My need for a queue is to prevent loss of data when the indexer goes down. I just saw this: http://michael.bouvy.net/blog/en/2013/12/06/use-lumberjack-logstash-forwarder-to-forward-logs-logstash/#comment-1423397187 which makes me think (as I commented there) that logstash-forwarder removes my need for a queue entirely. Am I right? |
The design of the logstash-forwarder is to prevent loss of data when remote server goes down. There is no need for a broker agent in between lsf and Logstash. So, you are right! The "queue" that logstash-forwarder uses is actually the files it is reading from, and it uses a network protocol that ensures reliable delivery of that queue's contents (the lines of your logs) to downstream servers. |
That means I can reduce my chain to logstashforwarder -> logstash indexer -> elasticsearch. Nice. Thanks for the quick answers! |
RabbitMQ is what I'm interested in, so just focusing on it is fine with me. One scenario to consider is an already deployed RabbitMQ infrastructure which allows traffic from various network segments to communicate with it. If I can plug my messaging transit into a system which is already established and trusted for security, then my cost for getting messages from point A to point B are drastically reduced as the infrastructure is already in place. No new systems to deploy, no new ports to open or traffic patterns to identify. Depending on how RabbitMQ is deployed, there could be extra durability in the message transit in the event of failures. So in my case, I'm looking for: |
@joerocklin I am totally happy to have you use RabbitMQ, btw. In this case, you can achieve success by using something other than logstash-forwarder. Logstash itself supports rabbitmq output. Further, there are probably a dozen other projects that exist to forward logs and also support different protocols. Can you use one of those? If not, why not? |
@jordansissel For the reasons noted on the logstash page, running logstash proper on the app nodes is rather 'heavyweight', so something with a lighter footprint is desirable. Whether this is a 'real' problem or a perceived one is debatable, but either way: it's a problem. Using something from the same authors is nice, as the perception is that changes will occur in lockstep and we're less likely to get hit with strange update issues. Perhaps there need to be some updates to the documentation to answer some extra questions:
Why not use something else: I'm looking at some of the other options (my original comment was back from June). I would still really like to use something from the same authors of elasticsearch for the reasons mentioned above. If you know of other reliable projects that provide answers to requests that you do not plan to implement, it would be really helpful to provide some links to them. |
@joerocklin Regarding alternatives, I can recommend https://github.com/josegonzalez/beaver, a very lightweight log forwarder that talks RabbitMQ (and various other protocols). It is written in Python, and we are using it on a few hundred machines for about a year now, with no perceivable problems. @jordansissel Since I opened the initial ticket, I’d like to sharpen the requirements I stated in my initial post:
I hope this makes my intentions a bit clearer. |
I did some thinking about this. Three points came to mind. First, I am personally resisting rabbitmq due to previous and many bad experiences with AMQP, its fracture, and its complexity and that complexity's impact on users. Summarize my view here simply as "opinion" and we can throw it in the trash because it's of little value to the technical discussion at hand - my opinion of amqp has nothing to do with your opinion, experience, or need of amqp. I want to be clear that I want to take my opinion here out of the discussion, so I"ll try to avoid it in the future :) Second, I don't remember much about AMQP or RabbitMQ, so I have little confidence in my own personal ability to support users on such a feature. This lack of confidence manifests itself in my resistance to the feature. My confidence, like my opinion, should not be forced to impact your business needs, requirements, opinions, or experiences. This is a community, not a "Jordan being by himself" and as such we can remove my fear and lack of confidence from the arguments against rabbitmq support. Third, we are actively working on a new protocol design, and the current model we are discussing internally doesn't seem like it would work well over RabbitMQ simply because there's going to be new needs for bidirectional communication between lsf and downstream servers. This new model is not set in stone, but it might be annoying to try and shoe-horn over RabbitMQ's protocol. I'll know more once we further discuss this internally. Given points 1 and 2 being maligned opinion and lack of confidence, we can throw that out. I am willing to consider RabbitMQ as a transport (even if I can't support it personally due to lack of knowledge), but only once we figure out what the new protocol concept is going to look like. Does this make sense? |
@joerocklin and @lxndrp and everyone else on this ticket: I very much appreciate your efforts and time spent in this discussion. |
@lxndrp Hi Alex, as @jordansissel mentioned, we are in the review stage of the new protocol and other enhancements that would address the reliability and resilience issues, and of course would love to get the community feedback on this. I'll post an update on this here when we've gone through our internal review cycle on this. |
Thanks @jordansissel, @alphazero for the update. I'd be very happy to provide feedback or otherwise help with LSF. Let me know if you need anything. |
@jordansissel you say that LSF can observe low traffic and network crash and manage log shipping, what if a network problem happens just after the log been shipped, the latter will be lost in nature. Do LSF ensure really a guaranteed delivery (using e.g a dead letter channel)? I'll appreciate you hint, folks. |
@mohben the guarantee is that it will be delivered at least once to the receiver (logstash) In your scenario it will ship the log again since it could not guarantee the remote side received it successfully as no acknowledgement was received. |
Hmmm, LSF is so expecting an ack from it's output ? |
Yes the lumberjack protocol has acks. Logstash will ack once it's queued the log. However if logstash crash it can lose whatever is in queue which is 10 items. But that's it. There's no end to end guarantee from forwarder to elasticsearch. But having guarantee on the network forwarder to logstash at least reduces impact significantly |
I was about the add a feature request for AWS SQS output and then saw that related issues (output to Kafka) is directed here so i want to add my notes here. First of all, thanks for the wonderful software, it is lightweight and works really well. And my notes on the subject: Suppose we have many servers generating lots of logs also server count and log volume tends to grow. But we don't want/need to process&store&query the logs in realtime and at the same pace with the log generation. Because we want to lower the costs. For example: Most mobile games have an activity graph which tends to increase in certain times of the day and then gradually decrease in certain times of the day. We can tolerate logs to pile up in busy times because we know that there will be non-busy times and logs will slowly be drained. What we need is:
|
It seems that people here (including me) are looking for some kind of "rabbitmq-forwarder". logstash-forwarder is provided by logstash community as a lightweight tool to send data to logstash. Why should it get fatter to communicate with something else ? A tool sending data to rabbitmq should be provided by rabbitmq community! And well...it actually exists: it's rabbitmq itself, with the Federation plug-in: https://www.rabbitmq.com/federation.html The only thing that I miss now is a clean way to send data to the rabbitmq "local agent" from stdin... Many small scripts can do that, but nothing in the "clean ecosystem", i.e. maintained, updated, packaged, ... |
I played with ZeroMQ some time ago and I now have a stable implementation in Log Courier if you are interested, which is working quite well for me. Log courier is based on logstash-forwarder - it has all my major changes and improvements. The idea is to completely phase out a requirement of any other item in the stack except the shipper and Logstash. So we can set up shippers to load balance events across multiple Logstash instances and automatically fail over and retransmit as required. Thought it worth mentioning if people are looking for something similar. I'll be improving it more over time too so feedback is welcome. Building it with ZeroMQ 3.2 is fairly straight forward on most distributions as there are zeromq3 packages available (CentOS/Ubuntu etc). It can also run with Curve encryption if you manage to get ZeroMQ 4.0 packages. I'm planning to provide CentOS packages with zeromq3 support soon. |
@driskell I've tried Log Courier which works fine but unfortunately doesn't fit my need as it doesn't support JSON as input codec. As it seems you've planned to support it, I'll benchmark it as soon as it's available. |
@gauthier-delacroix You can still use the JSON filter provided by Logstash which is very quick (it uses an extremely fast Jackson JSON library). This is how I manage it at the moment. But yes I do plan to allow additional codecs - since adding them does not increase resource usage at all unless you enable them, so it keeps light weight - but adds flexibility when needed. |
@driskell As I'm using varnishncsa to generate my logs, I can directly format them in JSON. Having around 50kRPS peaks is a good reason to avoid using logstash filters as much as possible (I'll need some anyway) and distribute the overhead across my varnish servers (which need as much free RAM as possible but have a lot of free CPU time). I'll keep an eye on Log Courier anyway and try it again as soon as JSON codec is available because I'm interested by many of your features. |
Just want to add a big +1 for the lsf-> RabbitMQ feature request. I have a very similar use case as @joerocklin and @lxndrp where I want to leverage an existing RabbitMQ infrastructure to transport logs. Having said that, I really appreciate the effort put in by the authors in actively discussing this new feature in-spite of lacking technical confidence. |
LSF -> SQS would be awesome 👍 |
From the sounds of things, lsf's functionality is going to be rolled into --David Reagan On Wed, Mar 25, 2015 at 7:06 AM, Karl Atkinson [email protected]
|
Thank you for discussing this issue, I really appreciate the work you're doing and the thought going into this. Here is my contribution to why this is an important feature when you start running Logstash and Elasticsearch at scale. @jordansissel asked:
Elastic recommends using a broker/queue when "data coming into a Logstash pipeline exceeds the Elasticsearch cluster’s ability to ingest the data". The assumption that forwarders can hold on to data until successfully pushed is invalid if your servers are part of an automatically scaling cluster or group. For example, servers in an AWS AutoScaling Group may be terminated at any time, and any log events not yet pushed will then be lost. You must use a queue, or you will regularly lose data. The recommended queueing pipeline design puts a shipper between the Logstash Forwarders and the queue: In this case the forwarder->logstash only limitation adds unnecessary complexity. I understand there are issues involved with supporting other tools, like RabbitMQ, but from a systems design perspective the shipper in this picture is pure waste. If you already have a queue where all events are buffered, why not forward the log events straight to the queue? Why send them via an additional Logstash server? This is not a problem with only a handful of forwarders, but with hundreds or thousands of them you will need an increasing number of powerful shippers. @gauthier-delacroix suggests shifting the forwarding responsibility to RabbitMQ - I'm all for that, but if queueing is part of the recommended design of a scalable Logstash, wouldn't it make sense for Logstash Forwarder to facilitate it? There is a related discussion at elastic/logstash#3693, but I'm not sure how/if it addresses this issue. |
It's been 4 months since the last activity on this. Any updates on the way forward ? |
As logstash-forwarder is no longer under active development and was replaced by filebeat, it is best to continue this discussion under the following issue: elastic/beats#943 |
Folks,
I would appreciate very much if you'd consider adding AMQP as an output transport to lumberjack.
At least for the RabbitMQ and ActiveMQ implementations of AMQP, authentication, encryption, and compression are supported, so that the main requisites for a new transport are met.
The background is that we are running very large numbers of potential log shippers, and I'd like to distribute the load via our RabbitMQ HA clusters rather than via direct
lumberjack -> logstash
connections, as we cannot afford loss of messages on server crash or overload. RabbitMQ brings features for federation, queue rerouting, throttling and rate limiting, replication and many other things which logstash does not have.Let me know what you think.
Cheers,
Alexander
The text was updated successfully, but these errors were encountered: