Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Irqstat input plugin #2494

Merged
merged 8 commits into from
Apr 13, 2017

Conversation

calerogers
Copy link
Contributor

@calerogers calerogers commented Mar 5, 2017

Required for all PRs:

  • CHANGELOG.md updated (we recommend not updating this until the PR has been approved by a maintainer)
  • Sign CLA (if not already signed)
  • README.md updated (if adding a new plugin)

@danielnelson
Copy link
Contributor

This would implement #2469

cc @phemmer

@phemmer
Copy link
Contributor

phemmer commented Mar 7, 2017

Well, kinda. The proposed output in #2469 is very different from the output implemented here. To me the output here would not be very useful as you would have to know what each interrupt is on every host. A device might be IRQ X on host A, and IRQ Y on host B. This is the reason for including the type and device tags. Additionally not all hosts have the same CPU configuration, which is the reason for the proposed total field. Also the resulting counters are presented as strings, which make mathematical operations impossible. You'd be unable to do things such as calculate the rate at which an IRQ is firing.

@calerogers
Copy link
Contributor Author

I was considering making the IRQ values integers for that reason and I think it possible to just convert the strings to int before adding it to the map. I was mostly trying to mimic the behavior produced by the collectd IRQ plugin.

@sparrc
Copy link
Contributor

sparrc commented Mar 7, 2017

@calerogers they should be integers

@calerogers
Copy link
Contributor Author

calerogers commented Mar 7, 2017

@sparrc I committed the change to convert the IRQ values to ints. @phemmer I didn't notice you put in a feature request around this with more in-depth plans on how to go about structuring the plugin, so sorry if this seems like it came out of nowhere. I have been kind of working on this (along with learning golang) on and off for last month when I've had free time.

@danielnelson
Copy link
Contributor

The schema proposed by @phemmer sees more useful to me, @calerogers do you want to switch over to this format?

@calerogers
Copy link
Contributor Author

calerogers commented Mar 11, 2017

@danielnelson I agree the schema presented in #2469 is more useful, especially the individual IRQs as separate series and the IRQ values split up by CPU and total as fields. After testing this a bit I think there could be issues parsing the type and devices due to the fact that there is not a standardized delimiter between each column in the /proc/interrupts and /proc/softirqs files. It is an arbitrary amount of spaces seemingly based on the length of the values in the columns. I wouldn't be surprised if this was a reason the collectd IRQ plugin does not parse these.

@phemmer did you have an idea around reliably parsing the varying degree of types and devices on different systems? Perhaps instead of trying to separate type and device, make one string type_device that grabs everything after the last IRQ value column so in this case the data is still gathered.

@phemmer
Copy link
Contributor

phemmer commented Mar 11, 2017

Parsing should be very easy.

  1. read first line and split with strings.Fields(). Use these as your CPU field names.
  2. read each remaining line and split each with strings.Fields().
  3. Strip trailing : off first field. Use this field as irq tag.
  4. Use fields[1:len(cpuNames)] as the values for the CPUs.
  5. Try and parse IRQ as a number.
  6. If IRQ parses as number, use fields[len(cpuNames)+1] as the type tag, and strings.Join(fields[len(cpuNames)+2:]," ") as the device tag.
  7. If IRQ fails to parse as a number, use strings.Join(fields[len(cpuNames)+1:]," ") as the type tag.

@calerogers
Copy link
Contributor Author

calerogers commented Mar 11, 2017

Yes doing that would be easy and what you have outlined was how I was going about it, but that is assuming the type column would not have any spaces. Example lines from a digital ocean VM /proc/interrupts file is this:

 0:         23   IO-APIC   2-edge      timer
 1:          9   IO-APIC   1-edge      i8042
 4:       3250   IO-APIC   4-edge      serial
 6:          3   IO-APIC   6-edge      floppy
 8:          1   IO-APIC   8-edge      rtc0

Admittedly, I am not sure how varied /proc/interrupts can be, so if you are comfortable with 6 and 7 I can do that.

The plugin reports IRQ metrics to telegraf based on IRQ instead
of CPU. It now parses `/proc/softirqs` as a separate measurement.
It also collects additional metrics including IRQ type, device
and IRQ value total.
@calerogers
Copy link
Contributor Author

calerogers commented Mar 12, 2017

The latest commit updates the irqstat plugin to collect metrics based on the specification in #2469

@calerogers calerogers force-pushed the calerogers-irqstat-plugin branch 14 times, most recently from 032e704 to 8443251 Compare March 20, 2017 04:59
@calerogers
Copy link
Contributor Author

@phemmer any thoughts on the current code?

@phemmer
Copy link
Contributor

phemmer commented Mar 21, 2017

Functionally I think it's sound. Mostly the stuff that was jumping out at me was code hygeine. But I'll leave that to @danielnelson, at least for the first round, as I tend to get a little pedantic.

@calerogers
Copy link
Contributor Author

@danielnelson any feedback you have on the plugin would be welcomed. I definitely want to continue work on merging this in. Thanks!

for _, file := range files {
data, err := ioutil.ReadFile(file)
if err != nil {
log.Fatal(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use acc.AddError here

var irqval, irqtotal int64
var irqtype, irqdevice string
fields := strings.Fields(scanner.Text())
ff := fields[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check the length of fields to ensure this doesn't cause a panic.

cpucount = len(fields)
}

if ff[len(ff)-1:] == ":" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use strings.HasSuffix. Instead of doing the rest of the code in a block, use continue:


if ff[len(ff)-1:] == ":" {
fields = fields[1:len(fields)]
irqid := ff[:len(ff)-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use strings.TrimRight

_, err := strconv.ParseInt(irqid, 10, 64)
if err == nil {
irqtype = fields[cpucount]
irqdevice = strings.Join(fields[cpucount+1:], " ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could panic if the expected fields are missing. Since we sliced off the first value, cpucount no longer represents what it is named, I think it needs to be renamed.

Copy link
Contributor Author

@calerogers calerogers Apr 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I agree. In my mind cpucount still represents exactly what it is, which is how many CPUs are present on a system, regardless whether we sliced off the first value or not. The count of how many CPUs is how one determines the position of IRQ type and device depending on whether the IRQ parses as a number or not.

I think moving the first value slicing down to after where we set the irqtype and irqdevice and instead doing irqtype = fields[cpucount+1] and increment by 1 the other places where fields[cpucount] is referenced as well. So like something like this:

        if err == nil && len(fields) > cpucount {
            irqtype = fields[cpucount+1]
            irqdevice = strings.Join(fields[cpucount+2:], " ")
        } else if len(fields) > cpucount {
            irqtype = strings.Join(fields[cpucount+1:], " ")
        }
        fields = fields[1:len(fields)]
        for i := 0; i < cpucount; i++ {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right about cpucount, just make sure you do not index past where you have tested for.

if i < len(fields) {
irqval, err = strconv.ParseInt(fields[i], 10, 64)
if err != nil {
log.Fatal(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error should either be returned from Gather, or use acc.AddError

irq.Tags["type"] = irqtype
irq.Tags["device"] = irqdevice
irq.Fields["total"] = irqtotal
if len(include) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combine the conditions if len(include) == 0 || if stringInSlice(...) { or have a function that handles the inclusion logic includeIRQ(IRQ) bool {. We know if we want to include it earlier in the function so it could be nice to skip the work.

1: 7 3 IO-APIC-edge i8042
NMI: 0 0 Non-maskable interrupts
LOC: 2338608687 2334309625 Local timer interrupts
MIS: 0`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some softirqs entries.


got := parseInterrupts(interruptStr, include)
if len(got) == 0 {
t.Fatalf("want %+v, got %+v", parsed, got)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorten up the assertions using testify: require(t, 0, len(got)). Do this throughout the function.

ID string
Fields map[string]interface{}
Tags map[string]string
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type should use the domain appropriate names for an IRQ (device, type, total) or it should be removed and Accumulator should be passed into the parseInterrupts function. You might think this will make testing difficult, but you can use testutil.Accumulator for that.

Side note: this type is essentially an internal type called Metric, but unfortunately Accumulator doesn't accept it.

@calerogers calerogers force-pushed the calerogers-irqstat-plugin branch 2 times, most recently from cfc37a3 to 84100b1 Compare April 5, 2017 03:08
@calerogers
Copy link
Contributor Author

@danielnelson thank you for reviewing the code and for the feedback. Not sure if I should reply to each individual comment of yours but the latest commit I just pushed should address them.

const sampleConfig = `
## A list of IRQs to include for metric ingestion, if not specified
## will default to collecting all IRQs.
# include = ["0", "1", "30", "NET_RX"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to use the telegraf tagpass/tagdrop feature. Just to keep the number of ways to do something down to a minimum.

for _, file := range files {
data, err := ioutil.ReadFile(file)
if err != nil {
acc.AddError(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably do acc.AddError(fmt.Errorf("reading %s: %s", file, err)). So that we know which file the error is for.
It also seems like we need a continue here, to skip to the next file.

Ditto for line 107

if i < len(fields) {
irqval, err = strconv.ParseInt(fields[i], 10, 64)
if err != nil {
return irqs, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add context to the error, so that we know what failed to parse.
Maybe something like fmt.Errorf("unable to parse %q from %q: %s", fields[i], scanner.Text(), err).

}

func parseInterrupts(irqdata string, include []string) ([]IRQ, error) {
var err error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't look like this variable ever gets used. The only err is on line 59, and that's a new variable that's defined within scope of the for loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is returned at the end of the parseInterrupts function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but you're not using it. You might as well just return irqs, nil

Copy link
Contributor Author

@calerogers calerogers Apr 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean now. Thanks. Updated the code and re-committed.

@calerogers
Copy link
Contributor Author

@phemmer I agree with just using tagpass / tagdrop (didn't know this feature existed) instead of keeping a different list, I call this out in sampleConfig. Also +1 for adding context to errors. The latest commit should address your comments.

@calerogers calerogers force-pushed the calerogers-irqstat-plugin branch 2 times, most recently from d04978c to 8f8d16d Compare April 5, 2017 21:44
@calerogers
Copy link
Contributor Author

@danielnelson and @phemmer are there any other things that should be addressed before this can be merged in?

Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay, I haven't forgotten I'm just very busy.

These files are fairly tricky to parse, but I still feel like the complexity of the parse loop is too high and this makes it difficult to understand. I hope we can simplify it with some of these suggestions.

TASKLET: 205 0`

parsed := []IRQ{
IRQ{ID: "0", Type: "IO-APIC-edge", Device: "timer", Values: map[string]interface{}{"CPU0": int64(134), "CPU1": int64(0), "total": int64(134)}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reformat these lines so they are closer to 78 chars.

ID string
Type string
Device string
Values map[string]interface{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this always use int64 as the value type? I think what I would rather see here is a []int64 for per cpu values and a Total int64

}

func init() {
inputs.Add("irqstat", func() telegraf.Input {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I should have raised earlier, but I'm unsure about this name. Maybe it should be "interrupts", what do you think @phemmer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No argument from me on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like consensus has been made, but since I was pinged directly: I think I would favor "interrupts". Name should be reflective of what it provides, not where it gets the data. Though one could argue that "irqstat" is short for "IRQ statistics", which is reflective of what it provides.

_, err := strconv.ParseInt(irqid, 10, 64)
if err == nil && len(fields) > cpucount {
irqtype = fields[cpucount+1]
irqdevice = strings.Join(fields[cpucount+2:], " ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check that this is a valid index.

var irqval, irqtotal int64
var irqtype, irqdevice string
fields := strings.Fields(scanner.Text())
if len(fields) > 0 && fields[0] == "CPU0" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should scan the first line outside of the loop. This will reduce the complexity of this loop and prevent any strange effects if this matches somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to the for scanner.Scan() loop here? If so, how do you think I should go about parsing just the first line before actually initiating the Scan() function on the scanner?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion on this matter, but you can call scanner.Scan() once outside the loop.

irq := NewIRQ(irqid)
_, err := strconv.ParseInt(irqid, 10, 64)
if err == nil && len(fields) > cpucount {
irqtype = fields[cpucount+1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can all be directly assigned to the IRQ struct

irqtype = strings.Join(fields[cpucount+1:], " ")
}
fields = fields[1:len(fields)]
for i := 0; i < cpucount; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this loop before parsing type and device, this way the file is parsed left to right.

for _, irq := range irqs {
tags := map[string]string{"irq": irq.ID, "type": irq.Type, "device": irq.Device}
fields := irq.Values
if file == "/proc/softirqs" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid this, write a function that takes a path, have it parse the file and return the IRQ struct, then add the fields with the appropriate measurement name. Something sort of like this:

irqs := gather("/proc/interrupts")
for irq := range irqs {
	fields := fields(irq)
	tags := tags(irq)
	acc.AddFields("interrupts", fields, tags)
}

irqs = gather("/proc/interrupts")
for irq := range irqs {
	fields := fields(irq)
	tags := tags(irq)
	acc.AddFields("soft_interrupts", fields, tags)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielnelson I am slightly confused on this comment. Which part of this should be avoided? The current function parseInterrupts() takes in the content of each file passed to it in the first for loop as a string (irqdata, instead of taking in a path string) and returns the IRQ struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part in particular makes me feel like this loop is not structured very well if file == "/proc/softirqs" {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielnelson ok, my only concern is that this will cause a duplication of several lines instead using the for loop over a list of file names

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats okay, the pseudocode above could be improved on as well with a function AddIRQs(measurement, irqs)

@danielnelson danielnelson added this to the 1.3.0 milestone Apr 13, 2017
@calerogers calerogers force-pushed the calerogers-irqstat-plugin branch 3 times, most recently from 37addcb to ad9ad8f Compare April 13, 2017 01:18
Reformatting interrupts_test.go. Updating IRQ struct
@calerogers
Copy link
Contributor Author

@danielnelson I tried to address your comments in this last commit. I decided not to do a function since there we were only two file to parse. Let me know if there is anything else. Thanks for the help and guidance

Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this to the CHANGELOG too

if err != nil {
acc.AddError(fmt.Errorf("Reading %s: %s", "/proc/interrupts", err))
}
irqdata := string(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ReadFile errors we need to skip processing the file. To avoid a lot of nesting you will probably want the function... The function could return an empty []IRQ so nothing will be ranged over.

@danielnelson danielnelson merged commit cadd845 into influxdata:master Apr 13, 2017
@danielnelson
Copy link
Contributor

Thanks!

vlamug pushed a commit to vlamug/telegraf that referenced this pull request May 30, 2017
maxunt pushed a commit that referenced this pull request Jun 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants