-
Notifications
You must be signed in to change notification settings - Fork 8.5k
License fetching concurrency #77560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
License fetching concurrency #77560
Conversation
When it's ES is slow to respond with a license, or Kibana is overloaded and is slow making the request or handling the response, we were trying to fetch the license again. This will just skip that refresh interval, and catch it the next time around.
|
Pinging @elastic/kibana-platform (Team:Platform) |
pgayvallet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| expect(value.type).toBe('gold'); | ||
| refresh(); | ||
| // if we just call refresh() immediately here, the exhaustMap de-dupes the call | ||
| Promise.resolve().then(() => refresh()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there might be a bug in RxJs. It's not clear to me why this doesn't work as expected https://stackblitz.com/edit/exhaustmap-1 but https://stackblitz.com/edit/exhaustmap-2 and https://stackblitz.com/edit/exhaustmap-3 do...
Does this makes sense to you @pgayvallet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if a bug or an implementation subtlety when the consumer is synchronous... @rudolf do you have an opinion?
Anyway, if it's working with the Promise.resolve trick, the comment is probably good enough.
May want to change the Promise.resolve with a setImmediate though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does appear to be related to the synchronous subscriber. subscribeToPromise is first calling subscriber.next() before immediately calling subscriber.complete() here. If the subscription to exhaustMap is synchronous, it doesn't have the chance to mark the subscriber as "complete" before we emit another value onto the subject.
In my simplistic mental model, it shouldn't matter that the subscription is synchronous. Is that not the case in your alls experience?
With regard to Promise.resolve vs setImmediate, either will work. After giving this more thought, I think we should use process.nextTick. Borrowing a diagram of the event-loop from https://blog.insiderattack.net/timers-immediates-and-process-nexttick-nodejs-event-loop-part-2-2c53fd511bb we don't necessarily need to wait until the event-loop is in the "Immediate Queue" phase, we just need some mechanism of allowing subscriber.complete() to be called. The Promise.resolve approach will run this operation as part of the "Microtasks", which works, but is somewhat confusing. Where-as process.nextTick will perform a similar function but more explicitly. setImmediate will cause this to be run during a different phase of the event-loop, which works, but we don't necessarily need this to be run in a specific phase:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does appear to be related to the synchronous subscriber. subscribeToPromise is first calling subscriber.next() before immediately calling subscriber.complete() here. If the subscription to exhaustMap is synchronous, it doesn't have the chance to mark the subscriber as "complete" before we emit another value onto the subject.
Might be a bug in rxjs implementation then.
In my simplistic mental model, it shouldn't matter that the subscription is synchronous
Yea, it shoudn't, theoretically. However in my experience, it does sometimes 😄. TBH I always wondered why they choose to handle subscriptions / emissions synchronously. This always leads to some borderline cases.
With regard to Promise.resolve vs setImmediate, either will work. After giving this more thought, I think we should use process.nextTick
Fine with me. TBH, for such a 'small' delayed execution, it probably doesn't matter much. But I agree with you on paper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'm going to proceed with this approach and open an issue in the RxJs repo about synchronous subscriptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@elasticmachine merge upstream |
💚 Build SucceededBuild metricspage load bundle size
History
To update your PR or re-run it, just comment with: |
* Switching license polling from a switchMap to an exhaustMap When it's ES is slow to respond with a license, or Kibana is overloaded and is slow making the request or handling the response, we were trying to fetch the license again. This will just skip that refresh interval, and catch it the next time around. * Explicitly ignoring a 429 from triggering a license refresh * The existing unit tests pass! * Only refreshing the license once at a time * Adding test for the onPreResponse licensing handler * Removing errant newline * Removing errant 'foo' * Now with better comments! * Fixing oddity with the exhaustMap * Just a bit of tidying up * Use process.nextTick() instead of the confusing Promise.resolve Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: Elastic Machine <[email protected]>

When Kibana was under heavy load, the lack of concurrency control around license fetching was causing issues.
Previously, every 30 seconds on an interval Kibana would poll for a new license. If Kibana's event-loop was delayed for a significant period of time preventing the normal interval from firing, once Kibana's event-loop was freed up we'd see a number of requests to refresh the license all at the same time. This is resolved by switching the
switchMapto anexhaustMap, which was done in 12441f5.Additionally, the licensing plugin was using an onPreResponse handler to refresh the license whenever the response status code was >= 400. Fleet is responding with an explicit
429when there are too many concurrent agents, and we don't need to refresh the license in these scenarios. This is addressed by 51fbd96 which explicitly ignores429error codes.The license was being refreshed per response, so if 100 requests all errored out we'd try and refresh the license 100 times. There's no reason to refresh the license per request, or when the license is already being refreshed on the interval. This is addressed by 4f47771