Conversation
|
Pinging @elastic/kibana-platform (Team:Platform) |
| it('collects event loop delay', async () => { | ||
| const metrics = await collector.collect(); | ||
|
|
||
| expect(metrics.event_loop_delay).toBeGreaterThan(0); | ||
| }); |
There was a problem hiding this comment.
Mocking getEventLoopDelay is a pain, so I only tested that we collect an actual > 0 value here.
| describe('ServerMetricsCollector', () => { | ||
| let server: HttpService; | ||
| let collector: ServerMetricsCollector; | ||
| let hapiServer: HapiServer; | ||
| let router: IRouter; |
There was a problem hiding this comment.
The ServerMetricsCollector is collecting data from the HAPI server. Unit testing that was a pain, as a lot of the HAPI server methods and behavior would need to be properly mocked, so I did integration test instead (also I think this is more appropriate than UT for this specific collector)
There was a problem hiding this comment.
This is trying to test this part of the code:
const connections = await new Promise<number>(resolve => {
this.server.listener.getConnections((_, count) => {
resolve(count);
});
});I though this.server.listener.getConnections was returning the number of opened/pending connections, so I tried to test it by keeping waiting handlers, but the test fails ( metrics.concurrent_connections is always equals to 0).
The snippet is directly copied from
kibana/src/legacy/server/status/index.js
Lines 41 to 49 in 8e9a8a8
So it's very unlikely is doesn't work (or at least doesn't do what it's supposed to do), however I don't know how to to properly integration test it ( I could unit test it by mocking server.listener.getConnections, but mocking the whole hapi server just for that felt like overkill)
If someone got an idea...
There was a problem hiding this comment.
sendGet doesn't send a request, you need to call end with a callback. https://visionmedia.github.io/superagent/
supertest(hapiServer.listener)
.get('/')
.end(() => null);There was a problem hiding this comment.
It does when you await for it. using both end and awaiting it after throws an error stating that.
There was a problem hiding this comment.
Oh but I'm not awaiting for it here.... Thanks.
| return { | ||
| getOpsMetrics$: () => metricsObservable, | ||
| }; |
There was a problem hiding this comment.
I exposed this in the setup API, even if the observable doesn't emit until core's start phase. Tell me if we prefer moving it to the start API instead.
There was a problem hiding this comment.
I think it makes sense if we implement lazy logic
There was a problem hiding this comment.
Makes sense to me to maintain the pattern of "register things in setup"
| export class MetricsService | ||
| implements CoreService<InternalMetricsServiceSetup, InternalMetricsServiceStart> { | ||
| private readonly logger: Logger; | ||
| private metricsCollector?: OpsMetricsCollector; | ||
| private collectInterval?: NodeJS.Timeout; | ||
| private metrics$ = new ReplaySubject<OpsMetrics>(1); |
There was a problem hiding this comment.
I created a metrics module / service instead of naming that ops. I though that if at some point we want to expose other metrics, it would make more sense to have a global service for that.
| export interface OpsMetrics { | ||
| /** Process related metrics */ | ||
| process: OpsProcessMetrics; | ||
| /** OS related metrics */ | ||
| os: OpsOsMetrics; | ||
| /** server response time stats */ | ||
| response_times: OpsServerMetrics['response_times']; | ||
| /** server requests stats */ | ||
| requests: OpsServerMetrics['requests']; | ||
| /** number of current concurrent connections to the server */ | ||
| concurrent_connections: OpsServerMetrics['concurrent_connections']; | ||
| } |
There was a problem hiding this comment.
waiting for @chrisronline to reply to #46563 (comment) to know if we can regroup the server metrics in a server property instead of exposing them all at the root level as it was done in legacy.
There was a problem hiding this comment.
Are these in snake_case format just to maintain compatibility with legacy? Seems like we should rename to camelCase
There was a problem hiding this comment.
It is, see #46563. If we think this is not acceptable and that consumers should adapt, I could both rename everything to camel and create the server property I spoke about. Not sure how exactly the existing structure is allowed to move, maybe you can answer?
There was a problem hiding this comment.
nit: what if we use more descriptive duration type? schema.duration({ defaultValue: '5s' }),
There was a problem hiding this comment.
nit: could be created once instead of recreating on every call
There was a problem hiding this comment.
nit: platform: NodeJS.Platform
| max: 0, | ||
| }; | ||
|
|
||
| constructor(private readonly server: HapiServer) { |
There was a problem hiding this comment.
What if we prevent hapi server leakage and invert dependencies? The HTTP server could implement the collect interface instead.
There was a problem hiding this comment.
We are inside core and no hapi reference is leaking outside of it, so I would say it's alright. But I can move the server collector inside http and expose it from the internal http contract is you think this is better / more futur proof.
There was a problem hiding this comment.
We are inside core and no hapi reference is leaking outside of it, so I would say it's alright. But I can move the server collector inside http and expose it from the internal http contract is you think this is better / more futur proof.
It's up to you, but I'd prefer to have the one place to update if we decide to get rid of hapi one day.
There was a problem hiding this comment.
sendGet doesn't send a request, you need to call end with a callback. https://visionmedia.github.io/superagent/
supertest(hapiServer.listener)
.get('/')
.end(() => null);| return { | ||
| getOpsMetrics$: () => metricsObservable, | ||
| }; |
There was a problem hiding this comment.
I think it makes sense if we implement lazy logic
|
retest |
| export interface OpsMetrics { | ||
| /** Process related metrics */ | ||
| process: OpsProcessMetrics; | ||
| /** OS related metrics */ | ||
| os: OpsOsMetrics; | ||
| /** server response time stats */ | ||
| response_times: OpsServerMetrics['response_times']; | ||
| /** server requests stats */ | ||
| requests: OpsServerMetrics['requests']; | ||
| /** number of current concurrent connections to the server */ | ||
| concurrent_connections: OpsServerMetrics['concurrent_connections']; | ||
| } |
There was a problem hiding this comment.
Are these in snake_case format just to maintain compatibility with legacy? Seems like we should rename to camelCase
| return { | ||
| getOpsMetrics$: () => metricsObservable, | ||
| }; |
There was a problem hiding this comment.
Makes sense to me to maintain the pattern of "register things in setup"
| free_in_bytes: os.freemem(), | ||
| used_in_bytes: os.totalmem() - os.freemem(), | ||
| }, | ||
| uptime_in_millis: os.uptime() * 1000, |
There was a problem hiding this comment.
Same as other comment, can we rename to camelCase?
|
Remaining points to decide on:
Don't have a strong opinion on any of them, but we need a consensus. |
mshustov
left a comment
There was a problem hiding this comment.
- Should we use interval observable instead of
setInterval(that would force to move thegetOptsMetrics$API to the start contract) -
#58623 (comment)
not necessary
- Should we move the
servercollector to thehttpmodule to avoid leakage ofhapiAPI outside ofhttpmodule - #58623 (comment)
optional. #58623 (comment)
- Can we take this NP migration opportunity to change the ops metrics structure (at least in core) - #58623 (comment)
can be done as follow-up
|
Created #59113 to track the follow-up improvements |
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
* create base service and collectors * wire the service into server, add mock * add collector tests * add main collector test * export metric types from server * add service and server tests * updates generated doc * improve doc * nits and comments * add disconnected requests test
* create base service and collectors * wire the service into server, add mock * add collector tests * add main collector test * export metric types from server * add service and server tests * updates generated doc * improve doc * nits and comments * add disconnected requests test
|
How do I access this in NP? This doesn't have a |
|
@chrisronline Wow. I guess I just forgot to add it to the public contract. Will address that asap. |
|
Created #59294 |
|
I'm comparing the data returned from this new API to what currently exists in the monitoring code base and I'm seeing an issue (there might be more, but this is the first I dug into) In the existing system, we request data which then "flushes" the event rolling system which "resets" the existing state. So essentially, all However, this API seems to buffer the data for the entire duration of the process - resulting in different reported values from the existing system. Perhaps we should add a way to flush the system, or potentially introduce instances of the metrics collector that can support flushing some intermediate state (so to preserve the idea that the main metrics collector never resets its local state)? I don't know if there are other plugins that depend on these data or not. |
Yea, it seems I missed the fact that the HAPI network monitor actually resets it's state / requests after every collection
the monitoring plugin and the oss server monitoring (
Leading to: I think falling back to oppsy behavior, by reseting the network collector after every collection (every WDYT? |
|
Sure, that works for me! Thanks! |
|
@chrisronline created #59551 |
Summary
Fix #46563
metricscore service, and expose the associated API in core's publicsetupAPI.OpsMetricsreproduces the structure generated insrc/legacy/server/status/lib/metrics.jscorecollectors implementation no longer relies onoppsy(but is based on oppsy implem). Once all usages has been adapted to use this new core API, we should be able to removeoppsyfrom our dependencies.Checklist
For maintainers
Dev Docs
A new
metricsAPI is available from core, and allow retrieving various metrics regarding the http server, process and os load/usages