Skip to content

Conversation

@kobelb
Copy link
Contributor

@kobelb kobelb commented Sep 15, 2020

When Kibana was under heavy load, the lack of concurrency control around license fetching was causing issues.

Previously, every 30 seconds on an interval Kibana would poll for a new license. If Kibana's event-loop was delayed for a significant period of time preventing the normal interval from firing, once Kibana's event-loop was freed up we'd see a number of requests to refresh the license all at the same time. This is resolved by switching the switchMap to an exhaustMap, which was done in 12441f5.

Additionally, the licensing plugin was using an onPreResponse handler to refresh the license whenever the response status code was >= 400. Fleet is responding with an explicit 429 when there are too many concurrent agents, and we don't need to refresh the license in these scenarios. This is addressed by 51fbd96 which explicitly ignores 429 error codes.

The license was being refreshed per response, so if 100 requests all errored out we'd try and refresh the license 100 times. There's no reason to refresh the license per request, or when the license is already being refreshed on the interval. This is addressed by 4f47771

When it's ES is slow to respond with a license, or Kibana is overloaded
and is slow making the request or handling the response, we were trying
to fetch the license again. This will just skip that refresh interval,
and catch it the next time around.
@kobelb kobelb requested a review from a team as a code owner September 15, 2020 22:43
@kobelb kobelb added v8.0.0 v7.10.0 release_note:skip Skip the PR/issue when compiling release notes Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// labels Sep 15, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

expect(value.type).toBe('gold');
refresh();
// if we just call refresh() immediately here, the exhaustMap de-dupes the call
Promise.resolve().then(() => refresh());
Copy link
Contributor Author

@kobelb kobelb Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be a bug in RxJs. It's not clear to me why this doesn't work as expected https://stackblitz.com/edit/exhaustmap-1 but https://stackblitz.com/edit/exhaustmap-2 and https://stackblitz.com/edit/exhaustmap-3 do...

Does this makes sense to you @pgayvallet?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if a bug or an implementation subtlety when the consumer is synchronous... @rudolf do you have an opinion?

Anyway, if it's working with the Promise.resolve trick, the comment is probably good enough.

May want to change the Promise.resolve with a setImmediate though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does appear to be related to the synchronous subscriber. subscribeToPromise is first calling subscriber.next() before immediately calling subscriber.complete() here. If the subscription to exhaustMap is synchronous, it doesn't have the chance to mark the subscriber as "complete" before we emit another value onto the subject.

In my simplistic mental model, it shouldn't matter that the subscription is synchronous. Is that not the case in your alls experience?

With regard to Promise.resolve vs setImmediate, either will work. After giving this more thought, I think we should use process.nextTick. Borrowing a diagram of the event-loop from https://blog.insiderattack.net/timers-immediates-and-process-nexttick-nodejs-event-loop-part-2-2c53fd511bb we don't necessarily need to wait until the event-loop is in the "Immediate Queue" phase, we just need some mechanism of allowing subscriber.complete() to be called. The Promise.resolve approach will run this operation as part of the "Microtasks", which works, but is somewhat confusing. Where-as process.nextTick will perform a similar function but more explicitly. setImmediate will cause this to be run during a different phase of the event-loop, which works, but we don't necessarily need this to be run in a specific phase:

1_2yXbhvpf1kj5YT-m_fXgEQ

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does appear to be related to the synchronous subscriber. subscribeToPromise is first calling subscriber.next() before immediately calling subscriber.complete() here. If the subscription to exhaustMap is synchronous, it doesn't have the chance to mark the subscriber as "complete" before we emit another value onto the subject.

Might be a bug in rxjs implementation then.

In my simplistic mental model, it shouldn't matter that the subscription is synchronous

Yea, it shoudn't, theoretically. However in my experience, it does sometimes 😄. TBH I always wondered why they choose to handle subscriptions / emissions synchronously. This always leads to some borderline cases.

With regard to Promise.resolve vs setImmediate, either will work. After giving this more thought, I think we should use process.nextTick

Fine with me. TBH, for such a 'small' delayed execution, it probably doesn't matter much. But I agree with you on paper

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'm going to proceed with this approach and open an issue in the RxJs repo about synchronous subscriptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kobelb
Copy link
Contributor Author

kobelb commented Sep 18, 2020

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

page load bundle size

id value diff baseline
licensing 26.8KB -881.0B 27.6KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@kobelb kobelb merged commit 3d67eaa into elastic:master Sep 18, 2020
@kobelb kobelb deleted the license-fetching branch September 18, 2020 16:01
kobelb added a commit that referenced this pull request Sep 18, 2020
* Switching license polling from a switchMap to an exhaustMap

When it's ES is slow to respond with a license, or Kibana is overloaded
and is slow making the request or handling the response, we were trying
to fetch the license again. This will just skip that refresh interval,
and catch it the next time around.

* Explicitly ignoring a 429 from triggering a license refresh

* The existing unit tests pass!

* Only refreshing the license once at a time

* Adding test for the onPreResponse licensing handler

* Removing errant newline

* Removing errant 'foo'

* Now with better comments!

* Fixing oddity with the exhaustMap

* Just a bit of tidying up

* Use process.nextTick() instead of the confusing Promise.resolve

Co-authored-by: Elastic Machine <[email protected]>

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release_note:skip Skip the PR/issue when compiling release notes Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// v7.10.0 v8.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants