Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added new apachelog plugin #576

Closed
wants to merge 2 commits into from

Conversation

toni-moreno
Copy link
Contributor

collectd-apachelog-plugin

Description

A plugin for collectd which efficiently parses apache log files to get global and extended ( per HTTP CODE) metrics. It defines a http_perf ( performance ) new type, which defines the new metrics.

hit_rate ( hits /second)
hit_x_interval ( #hits in the collectd interval time=> will be used to count hits across time )
rt_avg ( average response time in the elapsed interval time in "ms" )
rt_max ( max response time in the elapsed interval time in "ms")
rt_min ( min response time in the elapsed interval time in "ms")

In extended mode gather performance statistics for :

  • global : all requests
  • 1XX : all HTTP 1XX code request
  • 2XX : all HTTP 2XX code request
  • 3XX : all HTTP 3XX code request
  • 4XX: all HTTP 4XX code request
  • 5XX: all HTTP 5XX code request

Limits

MAX FIELDS per line =100
if one apache log line has a greater number of 100 FIELDS ( space as FIELD separator) this line will be discarded.

MAX LINE SIZE = 16384 bytes.
if one apache log line is greater than 16384 bytes this line will be discarded

Customized configuration for Graphite/Collectd users

Add these storage-aggregation rules the (/opt/graphite/conf/storage-aggregation.conf) file.

[http_perf_hits]
pattern = \.hit_x_interval$
xFilesFactor = 0.5
aggregationMethod = sum

[http_perf_rt_avg]
pattern = \.rt_avg$
xFilesFactor = 0
aggregationMethod = avg

[http_perf_rt_min]
pattern = \.rt_min$
xFilesFactor = 0
aggregationMethod = min

[http_perf_rt_max]
pattern = \.rt_max$
xFilesFactor = 0
aggregationMethod = max

It can parse apache logs and supports for rotatelogs tool ( by checking the last created file with a pattern)

Build

You can now rebuild collecd project.

# ./build.sh
# ./configure  --enable-apachelog [other_configure_options]
# ./make
# ./make install

Configure The Plugin

  • Instance: Apache instance name (mandatory)

  • RenamePluginAs: (default:none)Used to put apachelog metrics beside de apache metrics in case of organize metrics by product better than by plugin source.

  • UseApacheRotatedLogs: (default:false) when true , apachelog plugin tracks always the last modified file with pattern in <File ""> section.
    See: http://httpd.apache.org/docs/2.2/programs/rotatelogs.html

  • ExtendedMetrics: (default:false) when true , apachelog plugin collect global and per http code ( 1XX, 2XX, 3XX, 4XX, 5XX ) performance data.

  • SetRespTimeField: ( default: 0/last) set the %D apachelog Field position ( with blanks as field separators).

  • SetHTTPCodeField: ( default: 9 ) set where to look for apache status code (Only used on ExtendedMetrics=true )

    NOTE: the field positions are always specified as:
    0=last
    1=first
    2=second
    ..
    N=last(also)

    NOTE2: these positions are the default for the folloging apache logFormat

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\" %D" combined
  • Configuration example File:
  LoadPlugin apachelog
  <Plugin apachelog>

    <File "/var/log/apache2/access.log*">  
      Instance "www_misite_com"
      RenamePluginAs "apache"
      UseApacheRotatedLogs "false"
      ExtendedMetrics "true"
      SetRespTimeField 0 
      SetHTTPCodeField  9  
    </File>

    <File "/var/log/apache2/access.log">  //filename Name on a fixed log name
      Instance "www_misite_com"
      RenamePluginAs "apache"
      UseApacheRotatedLogs "false"
    </File>

  </Plugin>

@toni-moreno
Copy link
Contributor Author

Hi, guys.

I've been analyzing apache log with this new plugin in production environment from 2 months ago and all seems to be ok.

What should I do to merge into the master collectd branch?

@otisg
Copy link

otisg commented May 28, 2014

Btw. why use this and not say rsyslog?

@toni-moreno
Copy link
Contributor Author

Hi Otis, I don't understand your question about rsyslog.

I wrote this plugin because of we have in our instalation a lot of apache servers that are logging hits and response time over files sometimes direct on disk and other using the rotatelog apache tool.

In both cases systems are in production stage and It is not posible to change any done apache configurations.

Can you give a track on what do you mean?

thanks very much

@otisg
Copy link

otisg commented May 28, 2014

@toni-moreno you don't need to change Apache. Logstash and rsyslog are separate tools. Rsyslog is probably already installed and maybe even running on your Linux(?) servers. Lots of good reading material on http://blog.sematext.com/tag/rsyslog/ including performance numbers.

@toni-moreno
Copy link
Contributor Author

Hi Otis it is a very interesting article and logstash seems a pretty and powerful tool suited for a more "opened" environment.

We have a lot of restrictions on deploying new tools.

We need a only tool which can get and process information from different sources. And we choose collectd because it can in a single process:

  • get system metrics ( linux,aix )
  • get JMX ( java plugin.)
  • get oracle data.( oracle plugin)
  • get apache log files with support of rotatelog tool ( this apache-log plugin)
  • process some other generic logs ( tail , csv plugin)
  • send to a graphite backend.

We have plans to deploy over aprox from 1000 to 3000 servers minimizing changes , with this plugin no changes needed on any default system tool ( syslog) or production servers ( apache)

We have also a lot of heavy loaded apache servers , so we prefer a C built tool ( indeed is the fastest apache log processing way ,). I have not seen specific support for the rotatelog tool in logstash, which we have in lots of servers.

Perhaps in a long term we can redesign our performance collecting system and add logstash but it is no well suited now.

@toni-moreno
Copy link
Contributor Author

Hi guys.

I've rebased this PR to simplify the merge !

added README description

improved metric definitions count will be hit_rate, and added hit_x_interval metric
@toni-moreno
Copy link
Contributor Author

hi @octo , @mfournier , i've rebased this new plugin and added doc to the man page.

I hope you can merge it to the master branch.

I have it in production systems an running ok !

@toni-moreno
Copy link
Contributor Author

As an example

This is the metric tree (as I see in graphite) that we get with this plugin "RenamedAs" "apache" to hung under the same instance all related metrics

image

@toni-moreno
Copy link
Contributor Author

Hi @octo,@pyr I've rebased , build and test this PR and all OK!

@toni-moreno
Copy link
Contributor Author

Hi @octo, @mfournier

do you need any other thing before merge this PR?

thank you very much

@toni-moreno
Copy link
Contributor Author

Hi @octo @mfournier any update on this PR?

I hope you can merge ASAP , feel free to ask me for any other requirements on the code if you need.

@pyr
Copy link
Member

pyr commented Nov 20, 2014

Hi @toni-moreno,

This plugin is stretching collectd's problem domain a bit. I get the intent, but I'm not convinced this deserved to go in at all. I think the best place for such a plugin would be out of tree.

The recommendations to look into syslog-ng, rsyslog (one of which you might be running already) or tools like fluentd & logstash are all valid.

The fact that noone stepped in to express interest in this is also a bit concerning.

Thanks anyhow for putting in the work!

@pyr pyr closed this Nov 20, 2014
@toni-moreno
Copy link
Contributor Author

Sorry @pyr , I understand and accept your decision but I do not agree with the reasons.

In one hand I chose collectd because we needed a only tool to gather different sourced metrics from a lot of unix system products. Right now we are running collectd as our "all in one" performance metrics collector tool.

We can not install other collector tools on the system for political reasons. We know there is better suited tools to gather metrics from specific systems ( Grid Control for oracle, jmxtrans for JMX data, logstash for logs, etc) but collectd from its origin to yesterday has integrated plugins as a Swiss Army Knife, and we chose collectd just for that.

In the other hand (as far as I know ) there is no tool with ability to tail apache files with rotatelogs apache tool, in that point we decided to code this plugin for collectd. ( I would be pleased to test any other tool If we knew it has support to apachelog rotatelogs )(http://httpd.apache.org/docs/2.2/programs/rotatelogs.html)

I hope you can reconsider the merge of this PR, in the opposite case I suppose I could compile and build as a external tool and I could link with libcollectdclien library isn't it? Do you know any other tool which did that ?

In that case we would need a little change in the lib to support tail utils not seek to end of the file when it reopens them.

*src/daemon/utils_tail.c
*src/daemon/utils_tail.h

Could you accept this little backward compatible change , in order to compile and build this plugin outside collectd ?

Lots of thanks.

@chmac
Copy link

chmac commented Dec 29, 2014

Just finding this PR as I'm looking for a way to pull response time from apache logs and push to statsD. I was hoping to do that with CollectD, and understood from this that it was possible. With the caveat that I'm entirely new to CollectD, it makes a lot of sense to me to use CollectD to parse Apache logs and ship the data to something like StatsD or Graphite or whatever.

@toni-moreno
Copy link
Contributor Author

Hi @chmac I encourage you to ask for reopen this PR to @pyr and @octo as they closed just some few days ago.

Right now we are parsing Apache logs and sending data over graphite with this plugin.

I'm also waiting any @pyr suggestion on the question I did last post.

  • How can we build a collectd plugin outside the original collectd sources ?
  • Can they add a little change in src/daemon/utils_tail.{c,h}, needed to apachelog plugin ?

Thank you very much anyway.

@ghost
Copy link

ghost commented Feb 16, 2015

I am sure lots of people would benefit from that plugin.
I would.
Thanks @toni-moreno.

@toni-moreno
Copy link
Contributor Author

Thank you @kwinczek . I'm just pushed a new little fix.

I'm working with this plugin from some months ago and it is working good!

Feel free to ask me for anything you need about this new plugin.

@ghost
Copy link

ghost commented Apr 25, 2015

Hi Tony,

Thanks for sharing your code with the outside world. I would really benefit from this plugin as well when it is included in collectd by default.

@toni-moreno
Copy link
Contributor Author

Hi @MartinHerrman I will be happy if you can add any comment to the new PR #942. ( this was been closed some months ago and a new one have been opened ).

Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants