mt-parrot: continuous validation by sending dummy stats and querying them back #1680

fitzoh · 2020-02-18T16:30:27Z

…for each metrictank partition.

Builds on #1665, has initial support for querying metrics back out of metrictank.

Doesn't actually emit useful stats on the results yet, but I wanted to get some feedback on the type of stats to emit.

monitor.go is probably the most interesting file at the moment

fitzoh · 2020-02-18T16:31:54Z

cmd/mt-parrot/monitor.go

+			//total number of entries where drift occurred
+			fmt.Printf("parrot.monitoring.nonMatching;partition=%d; %d\n", stats.partition, stats.numNonMatching)
+			fmt.Println()
+		}


These are the proposed/sample metrics I'm thinking about emitting... do these make sense?

can we register the metrics up front?

I think this set is pretty good.
When I think about "what could problems in the data be?", I think it could be any combination of:

incorrect values (val != ts): number could be off or be null.

too many points / not enough points / unevenly spaced points / incorrect timestamps (all points should divide by the period without remainder and be 1 period apart) / points are not in sorted order

1 is well covered by the nancount and deltaSum, but 2 isn't really covered yet.

The delay (delta between last point sent and last non-nullpoint seen) is also interesting. "lag" seems to want to monitor something a bit differently.

can we register the metrics up front?

So that would probably be a map of partition -> struct of partition specific metrics?

2 isn't really covered yet

I'll work something up on those

The delay (delta between last point sent and last non-nullpoint seen) is also interesting. "lag" seems to want to monitor something a bit differently.

Not sure what the latter portion of that means

latter portion:
parrot.monitoring.lag measures something differently then what i would think it should measure.

So that would probably be a map of partition -> struct of partition specific metrics?

or a slice, is just a bit more efficient to work with than a map (we can exploit the fact that partition id's are a seqence that starts at 0, works well as slice indices)

can we register the metrics up front?

57151f2

1 is well covered by the nancount and deltaSum, but 2 isn't really covered yet.

36498c0

cmd/mt-parrot/monitor.go

docs/tools.md

publish/kafka/publish.go

docs/tools.md

cmd/mt-parrot/monitor.go

fitzoh · 2020-02-26T04:33:00Z

This shouldn't be merged until #1685 and #1687 are merged and rebased

docs/tools.md

cmd/mt-parrot/main.go

cmd/mt-parrot/generate.go

cmd/mt-parrot/main.go

cmd/mt-parrot/monitor.go

cmd/mt-parrot/generate.go

fitzoh · 2020-03-04T05:26:14Z

I think this is about ready for another pass @Dieterbe.

Stats aren't actually being published yet as they're blocked by #1706, but I think all current comments have been addressed.

fitzoh · 2020-03-17T14:09:22Z

CI is currently failing due to parrot metrics not being defined in metrics.md...
Does it sound reasonable to update the check to exclude them?

I feel like that's potentially confusing to document them with the main metrictank metrics.

Dieterbe · 2020-03-19T18:32:37Z

The best solution for that would be if we can generate separate metrics.md files based on the binary.
Can you see if you can tweak https://github.com/Dieterbe/metrics2docs such that it does this?

fitzoh · 2020-03-27T07:42:16Z

The best solution for that would be if we can generate separate metrics.md files based on the binary.
Can you see if you can tweak https://github.com/Dieterbe/metrics2docs such that it does this?

This came up in a 1-1 with @woodsaj , his advice was to not worry about documenting parrot metrics for now, so I've disabled metrics2docs for the parrot command by removing the metric prefix from the docs in b2612af

woodsaj · 2020-03-27T12:39:02Z

his advice was to not worry about documenting parrot metrics for now

Yep, fixing metrics2docs is well outside of scope for this PR, so lets not let if block us from getting this work merged.

I have opened an issue for improving metrics2docs
#1731

we wouldn't want to process old ticks with a lag as that would produce incorrect outcomes

see #1680 (comment)

introduce "post nan points" concept

Dieterbe · 2020-03-30T22:29:46Z

@fitzoh I have pushed a bunch of commits that should address all my comments, as well as some feedback that was easier to just do rather than type it out.
I have not tested this yet. Once you've had a chance to go through it, maybe we can get together to clarify if something's unclear and test it and work out any final kinks

Dieterbe

should be good enough.
we can do further tweaks in subsequent PR's.

fitzoh requested a review from Dieterbe February 18, 2020 16:30

fitzoh commented Feb 18, 2020

View reviewed changes

cmd/mt-parrot/monitor.go Outdated Show resolved Hide resolved

fitzoh force-pushed the parrot-init branch 2 times, most recently from 2ca8897 to 51f6291 Compare February 18, 2020 20:13

fitzoh mentioned this pull request Feb 24, 2020

Add tag based constructors for simple metrics. #1685

Closed

Dieterbe changed the title ~~Add mt-parrot command to generate deterministic artificial metrics…~~ mt-parrot: continuous validation by sending dummy stats and querying them back Feb 25, 2020

Dieterbe reviewed Feb 25, 2020

View reviewed changes

docs/tools.md Outdated Show resolved Hide resolved

Dieterbe reviewed Feb 25, 2020

View reviewed changes

publish/kafka/publish.go Outdated Show resolved Hide resolved

Dieterbe reviewed Feb 25, 2020

View reviewed changes

docs/tools.md Outdated Show resolved Hide resolved

Dieterbe reviewed Feb 25, 2020

View reviewed changes

cmd/mt-parrot/monitor.go Outdated Show resolved Hide resolved

Dieterbe reviewed Feb 25, 2020

View reviewed changes

cmd/mt-parrot/monitor.go Outdated Show resolved Hide resolved

Dieterbe reviewed Feb 25, 2020

View reviewed changes

cmd/mt-parrot/monitor.go Outdated Show resolved Hide resolved

fitzoh mentioned this pull request Feb 26, 2020

Ensure that metric id is always set during kafka ingestion #1687

Merged