[FEATURE REQUEST]: As a CF Operator, I expect that when I observe the "AppInstanceExceededLogRateLimitCount" metric, I can see the app instance details that caused this value to be incremented so I can take action with the app owner

## Is your feature request related to a problem?

Right now, when `AppInstanceExceededLogRateLimitCount` is emitted ([here](https://github.com/cloudfoundry/executor/blob/024d0bdd52d4fd9446cb6813217d67087a80341a/depot/log_streamer/log_rate_limit_reporter.go#L83)), it is emitted as a per-cell tagged counter metric. The loggregator agent adds just the cell-specific tags to the metric, nothing more, resulting in metrics that look like this:

```
deployment:"cf" job:"diego-cell" index:"0e98fd00-47b2-4589-94f0-385f78b3a04d" ip:"10.0.1.12" tags:<key:"instance_id" value:"0e98fd00-47b2-4589-94f0-385f78b3a04d" > tags:<key:"source_id" value:"rep" > counterEvent:<name:"AppInstanceExceededLogRateLimitCount" delta:1 total:206 >
```

As a result, operators must do some additional work to filter the app instances on the cell this metric comes from to identify the actual culprit of the chatty logging. This is not super straightforward and we can do better for operators so that it is much easier to identify problematic apps (and even app instances).


## Describe the solution you'd like

In the the log rate limit reporter, we have access to the [metric tags](https://github.com/cloudfoundry/executor/blob/024d0bdd52d4fd9446cb6813217d67087a80341a/depot/log_streamer/log_rate_limit_reporter.go#L49) set on the app's desired LRP and [use them](https://github.com/cloudfoundry/executor/blob/024d0bdd52d4fd9446cb6813217d67087a80341a/depot/log_streamer/log_rate_limit_reporter.go#L96) to push our own log line into the app log stream. We should be able to do the same thing with our `AppInstanceExceededLogRateLimitCount` metric in order to tag it accordingly and ensure the value emitted is not a per-cell metric but a per-app instance metric.

We can potentially just make a new version of our [IncrementCounter method](https://github.com/cloudfoundry/diego-logging-client/blob/60ef08820a4542b7414e0e2538c576576f08efb7/client.go#L177) to add tags to the envelope we want to send using [this option](https://github.com/cloudfoundry/go-loggregator/blob/3b3c7567591d7c217a33a511122fc186a5d8d1f7/ingress_client.go#L577).


## Diego repo

`executor`


## Describe alternatives you've considered

- An alternative is to do the emission logic [here](https://github.com/cloudfoundry/executor/tree/master/containermetrics) instead keeping in the log streamer package. This package already has reporters that periodically emit per-container (app instance) metrics.
  - maybe implement another metric reporter that is available to the log streamer
  - **IMO**, moving the actual call to the loggregator/metron client out to a goroutine separate from the main functionality is "safer," the way this code is currently written if the call to loggregator blocks, the work we care about is blocked; we use the parallel goroutine for metric emission pattern a lot I believe for this sort of purpose, and to keep metric emission logic/periodicity in one place so we don't have to re-implement periodic metrics repeatedly
- If the "counter" type metric does not work, we can try to use a gauge style metric instead



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE REQUEST]: As a CF Operator, I expect that when I observe the "AppInstanceExceededLogRateLimitCount" metric, I can see the app instance details that caused this value to be incremented so I can take action with the app owner #457

Is your feature request related to a problem?

Describe the solution you'd like

Diego repo

Describe alternatives you've considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE REQUEST]: As a CF Operator, I expect that when I observe the "AppInstanceExceededLogRateLimitCount" metric, I can see the app instance details that caused this value to be incremented so I can take action with the app owner #457

Description

Is your feature request related to a problem?

Describe the solution you'd like

Diego repo

Describe alternatives you've considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions