Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for AMQP (RabbitMQ preferred) as transport #190

Closed
lxndrp opened this issue Apr 5, 2014 · 35 comments
Closed

Support for AMQP (RabbitMQ preferred) as transport #190

lxndrp opened this issue Apr 5, 2014 · 35 comments

Comments

@lxndrp
Copy link

lxndrp commented Apr 5, 2014

Folks,

I would appreciate very much if you'd consider adding AMQP as an output transport to lumberjack.

At least for the RabbitMQ and ActiveMQ implementations of AMQP, authentication, encryption, and compression are supported, so that the main requisites for a new transport are met.

The background is that we are running very large numbers of potential log shippers, and I'd like to distribute the load via our RabbitMQ HA clusters rather than via direct lumberjack -> logstash connections, as we cannot afford loss of messages on server crash or overload. RabbitMQ brings features for federation, queue rerouting, throttling and rate limiting, replication and many other things which logstash does not have.

Let me know what you think.

Cheers,
Alexander

@graph1zzlle
Copy link

+1 !

Same use case here !

@petebowden
Copy link

Why not lumberjack -> logstash (collector) -> AMQP -> logstash (processing)

Understood that it's another application to run.
Also lumberjack should stop shipping if logstash goes down?

Pete Bowden

[email protected]

On Mon, Apr 7, 2014 at 4:51 AM, graph1zzlle [email protected]:

+1 !

Same use case here !

Reply to this email directly or view it on GitHubhttps://github.com//issues/190#issuecomment-39707937
.

@graph1zzlle
Copy link

That would introduce a useless logstash collector layer, and with the load might need multiple collector instance, load balanced as well, so...

@driskell
Copy link
Contributor

driskell commented Apr 9, 2014

Hi guys,
I believe the elasticsearch team were already working on a ZeroMQ implementation. But time was moved to finish LogStash 1.4. I think once things calm down there and resource get freed they'll probably finish it off.
Jason

@jordansissel
Copy link
Contributor

It is unlikely logstash-forwarder will support AMQP. If you need AMQP support, logstash can do this, and as suggested above, I also recommend logstash-forwarder -> logstash -> amqp

This project aims to solve a small specific problem and adding AMQP support is against that goal, in a way. Logstash itself supports many protocols, AMQP included, so I recommend you use that instead if you need to use AMQP :)

@joerocklin
Copy link

@jordansissel can you explain the 'small specific problem' with which AMQP support conflicts? Based on the documentation in the Readme, the output channel is only discussed with regard to the requirements for adding a new protocol and AMQP meets these goals.

@jerrac
Copy link

jerrac commented Aug 27, 2014

Those quotes are from the readme:

Actual Problems: Logstash, for right now, runs with a footprint that is not friendly to underprovisioned systems such as EC2 micro instances; on other systems it is fine. This project will exist until that is resolved.

My setup would benefit greatly from being able to run something with a lower RAM footprint than logstash itself. That's what I want to use logstash-forwarder for.

The lumberjack protocol used by this project exists to provide a network protocol for transmission that is secure, low latency, low resource usage, and reliable.

RabbitMQ supports ssl connections. http://www.rabbitmq.com/ssl.html

Creating another server, or instance of logstash, just to stage logs from logstash-forwarder to rabbitmq feels kind of messy. It introduces another place where things could break down.

logstashforwarder -> logstash stager -> rabbitmq -> logstash indexer -> elasticsearch is a pretty long chain...

Anyway, that's why I'd like to see logstash-forwarder be able to send to rabbitmq. :)

@jordansissel
Copy link
Contributor

@joerocklin I lost interest in AMQP around 2-3 years ago when the AMQP ecosystem fractured into brokers that supported 0.8, 0.9, 0.9.1, and 1.0. I don't know the current state of things, but I can confirm that Logstash renamed the "amqp" plugin to "rabbitmq" simply because through that fracture, and perhaps by accident, the only known-supported broker for the logstash amqp plugin was RabbitMQ - nothing else really worked due to protocol deviations. So from my experience, "AMQP" is this nebulous cloud of things that probably actually speak different protocols despite claims of using whatever they are calling "AMQP."

If we focus specifically on RabbitMQ, what is the benefit in doing putting a broker in between lsf and logstash, where today the protocol used does not require a broker? Operationally, it has been wonderful for users that a broker has not been required between lsf and logstash.

Creating another server, or instance of logstash, just to stage logs from logstash-forwarder to rabbitmq feels kind of messy.

RabbitMQ is a message passing system, so the end goal of moving your logs is never going to be "store them in rabbitmq" because rabbit is a transit system, not a storage system. Where do they go after that?

I want to keep logstash-forwarder easy to maintain and support, and it's not clear to me how adding RabbitMQ support would make it easier to maintain (more code) and easier to support (more complexity in setup, RabbitMQ is not well understood by many who use it based on my experiences).

@jerrac
Copy link

jerrac commented Aug 27, 2014

My need for a queue is to prevent loss of data when the indexer goes down.

I just saw this: http://michael.bouvy.net/blog/en/2013/12/06/use-lumberjack-logstash-forwarder-to-forward-logs-logstash/#comment-1423397187 which makes me think (as I commented there) that logstash-forwarder removes my need for a queue entirely. Am I right?

@jordansissel
Copy link
Contributor

My need for a queue is to prevent loss of data when the indexer goes down.

The design of the logstash-forwarder is to prevent loss of data when remote server goes down. There is no need for a broker agent in between lsf and Logstash.

So, you are right!

The "queue" that logstash-forwarder uses is actually the files it is reading from, and it uses a network protocol that ensures reliable delivery of that queue's contents (the lines of your logs) to downstream servers.

@jerrac
Copy link

jerrac commented Aug 27, 2014

That means I can reduce my chain to logstashforwarder -> logstash indexer -> elasticsearch. Nice. Thanks for the quick answers!

@joerocklin
Copy link

RabbitMQ is what I'm interested in, so just focusing on it is fine with me.

One scenario to consider is an already deployed RabbitMQ infrastructure which allows traffic from various network segments to communicate with it. If I can plug my messaging transit into a system which is already established and trusted for security, then my cost for getting messages from point A to point B are drastically reduced as the infrastructure is already in place. No new systems to deploy, no new ports to open or traffic patterns to identify. Depending on how RabbitMQ is deployed, there could be extra durability in the message transit in the event of failures.

So in my case, I'm looking for:
logstash-forwarder -> existing RabbitMQ infrastructure -> logstash processing nodes

@jordansissel
Copy link
Contributor

@joerocklin I am totally happy to have you use RabbitMQ, btw. In this case, you can achieve success by using something other than logstash-forwarder. Logstash itself supports rabbitmq output. Further, there are probably a dozen other projects that exist to forward logs and also support different protocols. Can you use one of those? If not, why not?

@joerocklin
Copy link

@jordansissel For the reasons noted on the logstash page, running logstash proper on the app nodes is rather 'heavyweight', so something with a lighter footprint is desirable. Whether this is a 'real' problem or a perceived one is debatable, but either way: it's a problem. Using something from the same authors is nice, as the perception is that changes will occur in lockstep and we're less likely to get hit with strange update issues.

Perhaps there need to be some updates to the documentation to answer some extra questions:

  1. re: logstash-forwarder existing until the problems (under-provisioned systems going away or logstash getting lighter-weight) no longer exist - Since it's unlikely that under-provisioned systems will go away, are there plans for changing logstash in such a way as to remove the need for logstash-forwarder
  2. re: Future Protocol Discussion - RabbitMQ can handle the transport security requirements, and the necessary pieces can be included in a packaged format (since there are go libs for rabbitMQ https://github.com/streadway/amqp). It doesn't care what the message content is, so it can be compressed and be whatever the other side need. I have no idea what this would do to the resulting binary size, and that could be a concern.

Why not use something else: I'm looking at some of the other options (my original comment was back from June). I would still really like to use something from the same authors of elasticsearch for the reasons mentioned above. If you know of other reliable projects that provide answers to requests that you do not plan to implement, it would be really helpful to provide some links to them.

@lxndrp
Copy link
Author

lxndrp commented Aug 27, 2014

@joerocklin Regarding alternatives, I can recommend https://github.com/josegonzalez/beaver, a very lightweight log forwarder that talks RabbitMQ (and various other protocols). It is written in Python, and we are using it on a few hundred machines for about a year now, with no perceivable problems.

@jordansissel Since I opened the initial ticket, I’d like to sharpen the requirements I stated in my initial post:

  1. I am asking for specifically RabbitMQ support, not AMQP. Being totally aware of the utter mess with AMQP standardization, I completely understand your reluctance working on this. But RabbitMQ is rock-solid, the interfaces are quite stable and, as you stated before, logstash is supporting it already.
  2. What I really want is reliable and resilient end-to-end delivery. Given the amount of logs we have to process, that’s why I’d love to see RabbitMQ support, because it helps me building a quite fault-tolerant, scalable setup. Using LSF and logstash directly doesn’t really help me, because then I would have a) to scale logstash collector instances (which can be done, but is more costly than doing this with RabbitMQ) and b) build layers of HA around logstash which I am getting for free with RabbitMQ (Clustering, Federation, Shovels, Routing, etc.).
  3. Although there are other shippers that have a small footprint, I still feel that LSF has an even smaller one. It is a very well written, designed-for-performance, do-one-thing-right Go application; and I like that.
  4. (you might consider this as stupid, but anyway) I feel somewhat better getting a toolchain from one supplier rather than many. Nothing against Jose Gonzales, who did a great job in writing beaver; but still, if LSF is something supported by Elasticsearch (as a company contributing to Open Source), I’d prefer to get things from there – for the same reason I am using RabbitMQ rather than some esoteric MQ system found somewhere on GitHub.

I hope this makes my intentions a bit clearer.

@jordansissel
Copy link
Contributor

I did some thinking about this. Three points came to mind.

First, I am personally resisting rabbitmq due to previous and many bad experiences with AMQP, its fracture, and its complexity and that complexity's impact on users. Summarize my view here simply as "opinion" and we can throw it in the trash because it's of little value to the technical discussion at hand - my opinion of amqp has nothing to do with your opinion, experience, or need of amqp. I want to be clear that I want to take my opinion here out of the discussion, so I"ll try to avoid it in the future :)

Second, I don't remember much about AMQP or RabbitMQ, so I have little confidence in my own personal ability to support users on such a feature. This lack of confidence manifests itself in my resistance to the feature. My confidence, like my opinion, should not be forced to impact your business needs, requirements, opinions, or experiences. This is a community, not a "Jordan being by himself" and as such we can remove my fear and lack of confidence from the arguments against rabbitmq support.

Third, we are actively working on a new protocol design, and the current model we are discussing internally doesn't seem like it would work well over RabbitMQ simply because there's going to be new needs for bidirectional communication between lsf and downstream servers. This new model is not set in stone, but it might be annoying to try and shoe-horn over RabbitMQ's protocol. I'll know more once we further discuss this internally.

Given points 1 and 2 being maligned opinion and lack of confidence, we can throw that out. I am willing to consider RabbitMQ as a transport (even if I can't support it personally due to lack of knowledge), but only once we figure out what the new protocol concept is going to look like.

Does this make sense?

@jordansissel
Copy link
Contributor

@joerocklin and @lxndrp and everyone else on this ticket: I very much appreciate your efforts and time spent in this discussion.

@alphazero
Copy link
Contributor

@lxndrp Hi Alex, as @jordansissel mentioned, we are in the review stage of the new protocol and other enhancements that would address the reliability and resilience issues, and of course would love to get the community feedback on this. I'll post an update on this here when we've gone through our internal review cycle on this.

@lxndrp
Copy link
Author

lxndrp commented Sep 1, 2014

Thanks @jordansissel, @alphazero for the update. I'd be very happy to provide feedback or otherwise help with LSF. Let me know if you need anything.

@mohben
Copy link

mohben commented Sep 2, 2014

@jordansissel you say that LSF can observe low traffic and network crash and manage log shipping, what if a network problem happens just after the log been shipped, the latter will be lost in nature. Do LSF ensure really a guaranteed delivery (using e.g a dead letter channel)?

I'll appreciate you hint, folks.

@driskell
Copy link
Contributor

driskell commented Sep 2, 2014

@mohben the guarantee is that it will be delivered at least once to the receiver (logstash)

In your scenario it will ship the log again since it could not guarantee the remote side received it successfully as no acknowledgement was received.

@mohben
Copy link

mohben commented Sep 2, 2014

Hmmm, LSF is so expecting an ack from it's output ?

@driskell
Copy link
Contributor

driskell commented Sep 2, 2014

Yes the lumberjack protocol has acks. Logstash will ack once it's queued the log.

However if logstash crash it can lose whatever is in queue which is 10 items. But that's it. There's no end to end guarantee from forwarder to elasticsearch. But having guarantee on the network forwarder to logstash at least reduces impact significantly

@cemuzunlar
Copy link

I was about the add a feature request for AWS SQS output and then saw that related issues (output to Kafka) is directed here so i want to add my notes here.

First of all, thanks for the wonderful software, it is lightweight and works really well.

And my notes on the subject:

Suppose we have many servers generating lots of logs also server count and log volume tends to grow. But we don't want/need to process&store&query the logs in realtime and at the same pace with the log generation. Because we want to lower the costs.

For example: Most mobile games have an activity graph which tends to increase in certain times of the day and then gradually decrease in certain times of the day. We can tolerate logs to pile up in busy times because we know that there will be non-busy times and logs will slowly be drained.

What we need is:

  1. A light and fast shipper sending raw logs from the local machine to a remote intermediary storage (AWS SQS etc.) which is easy to scale and durable. So we'll be sure our log message moved out of the generating machine and stored in a reliable storage for later consumption.

  2. We'll then consume the storage at a pace we need. There may be millions of logs in the storage and we may also be adding millions of new logs every second to the storage. But we can consume with one or more small logstash machines slowly and output to the final destinations. (ElasticSearch etc.)

@gdlx
Copy link

gdlx commented Nov 3, 2014

It seems that people here (including me) are looking for some kind of "rabbitmq-forwarder". logstash-forwarder is provided by logstash community as a lightweight tool to send data to logstash. Why should it get fatter to communicate with something else ?

A tool sending data to rabbitmq should be provided by rabbitmq community! And well...it actually exists: it's rabbitmq itself, with the Federation plug-in: https://www.rabbitmq.com/federation.html

The only thing that I miss now is a clean way to send data to the rabbitmq "local agent" from stdin... Many small scripts can do that, but nothing in the "clean ecosystem", i.e. maintained, updated, packaged, ...

@driskell
Copy link
Contributor

driskell commented Nov 3, 2014

I played with ZeroMQ some time ago and I now have a stable implementation in Log Courier if you are interested, which is working quite well for me. Log courier is based on logstash-forwarder - it has all my major changes and improvements.

The idea is to completely phase out a requirement of any other item in the stack except the shipper and Logstash. So we can set up shippers to load balance events across multiple Logstash instances and automatically fail over and retransmit as required.

Thought it worth mentioning if people are looking for something similar. I'll be improving it more over time too so feedback is welcome. Building it with ZeroMQ 3.2 is fairly straight forward on most distributions as there are zeromq3 packages available (CentOS/Ubuntu etc). It can also run with Curve encryption if you manage to get ZeroMQ 4.0 packages. I'm planning to provide CentOS packages with zeromq3 support soon.

@gdlx
Copy link

gdlx commented Nov 5, 2014

@driskell I've tried Log Courier which works fine but unfortunately doesn't fit my need as it doesn't support JSON as input codec. As it seems you've planned to support it, I'll benchmark it as soon as it's available.

@driskell
Copy link
Contributor

driskell commented Nov 5, 2014

@gauthier-delacroix You can still use the JSON filter provided by Logstash which is very quick (it uses an extremely fast Jackson JSON library). This is how I manage it at the moment. But yes I do plan to allow additional codecs - since adding them does not increase resource usage at all unless you enable them, so it keeps light weight - but adds flexibility when needed.

@gdlx
Copy link

gdlx commented Nov 5, 2014

@driskell As I'm using varnishncsa to generate my logs, I can directly format them in JSON. Having around 50kRPS peaks is a good reason to avoid using logstash filters as much as possible (I'll need some anyway) and distribute the overhead across my varnish servers (which need as much free RAM as possible but have a lot of free CPU time).

I'll keep an eye on Log Courier anyway and try it again as soon as JSON codec is available because I'm interested by many of your features.

@abhishekdelta
Copy link

Just want to add a big +1 for the lsf-> RabbitMQ feature request. I have a very similar use case as @joerocklin and @lxndrp where I want to leverage an existing RabbitMQ infrastructure to transport logs. Having said that, I really appreciate the effort put in by the authors in actively discussing this new feature in-spite of lacking technical confidence.

@karlatkinson
Copy link

LSF -> SQS would be awesome 👍
I've got a ton of EC2 instances I want to ship logs off without having to use full-on logstash.

@jerrac
Copy link

jerrac commented Mar 25, 2015

From the sounds of things, lsf's functionality is going to be rolled into
logstash itself. See
http://www.elastic.co/guide/en/logstash/roadmap/current/index.html
Hopefully lowering RAM usage is part of that.

--David Reagan

On Wed, Mar 25, 2015 at 7:06 AM, Karl Atkinson [email protected]
wrote:

LSF -> SQS would be awesome [image: 👍]
I've got a ton of EC2 instances I want to ship logs off without having to
use full-on logstash.


Reply to this email directly or view it on GitHub
#190 (comment)
.

@jonatanblue
Copy link

Thank you for discussing this issue, I really appreciate the work you're doing and the thought going into this. Here is my contribution to why this is an important feature when you start running Logstash and Elasticsearch at scale.

@jordansissel asked:

what is the benefit in doing putting a broker in between lsf and logstash, where today the protocol used does not require a broker?

Elastic recommends using a broker/queue when "data coming into a Logstash pipeline exceeds the Elasticsearch cluster’s ability to ingest the data".

The assumption that forwarders can hold on to data until successfully pushed is invalid if your servers are part of an automatically scaling cluster or group. For example, servers in an AWS AutoScaling Group may be terminated at any time, and any log events not yet pushed will then be lost. You must use a queue, or you will regularly lose data.

The recommended queueing pipeline design puts a shipper between the Logstash Forwarders and the queue:
deploy_5

In this case the forwarder->logstash only limitation adds unnecessary complexity. I understand there are issues involved with supporting other tools, like RabbitMQ, but from a systems design perspective the shipper in this picture is pure waste. If you already have a queue where all events are buffered, why not forward the log events straight to the queue? Why send them via an additional Logstash server? This is not a problem with only a handful of forwarders, but with hundreds or thousands of them you will need an increasing number of powerful shippers.

@gauthier-delacroix suggests shifting the forwarding responsibility to RabbitMQ - I'm all for that, but if queueing is part of the recommended design of a scalable Logstash, wouldn't it make sense for Logstash Forwarder to facilitate it?

There is a related discussion at elastic/logstash#3693, but I'm not sure how/if it addresses this issue.

@adiworkoholic
Copy link

It's been 4 months since the last activity on this. Any updates on the way forward ?

@ruflin
Copy link
Member

ruflin commented Mar 1, 2016

As logstash-forwarder is no longer under active development and was replaced by filebeat, it is best to continue this discussion under the following issue: elastic/beats#943

@ruflin ruflin closed this as completed Mar 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests