-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add serverWillStop lifecycle hook; call stop() on signals by default #4450
Conversation
Some thoughts:
|
ff2de06
to
1258777
Compare
2faa90d
to
3d61611
Compare
I put a lot of effort into actually writing tests for signal handling. But for some reason in the context of the test suite, One caveat is that if you have multiple ApolloServers in your process, they'll both begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments within, but this LGTM!
const nodeEnv = process.env.NODE_ENV; | ||
delete process.env.NODE_ENV; | ||
const samplePath = '/innerSamplePath'; | ||
|
||
const rewiredServer = new ApolloServer({ | ||
typeDefs, | ||
resolvers, | ||
playground: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this just a hacky workaround to make playground render in a testing env?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about this as the opposite — setting NODE_ENV to mean "render playground" is a hacky workaround, whereas given that this is a test of playground functionality, asking for what you need makes sense. I left in some tests which set NODE_ENV which are explicitly saying "make sure playground is on by default in production" though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're on the same page. This was a hacky workaround, but I prefer the change you made here. Just confirming my understanding of the change.
@@ -25,6 +25,35 @@ describe('apollo-server', () => { | |||
expect(() => new ApolloServer({ typeDefs, mocks: true })).not.toThrow; | |||
}); | |||
|
|||
it('runs serverWillStart and serverWillStop', async () => { | |||
const fn = jest.fn(); | |||
const oneTick = () => new Promise((res) => process.nextTick(() => res())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat, TIL. What does the additional tick do for testing that isn't just awaiting a "nothing promise" a la await new Promise(res => res())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also shorten this slightly by doing this:
const oneTick = () => new Promise((res) => process.nextTick(res));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I forgot that Promises always resolve asynchronously! @trevor-scheer yours looks better.
@@ -155,6 +155,7 @@ const port = 0; | |||
server = new ApolloServer({ | |||
typeDefs, | |||
resolvers, | |||
handleSignals: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For every test we add this it's because we're removing the test
from process.env.NODE_ENV
, right? I dunno that I want to see a comment for every one or maybe I do, but the necessity of introducing these may be unclear to a future reader / modifier of these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. One thought would be to add a helper that does "set NODE_ENV and pass handleSignals: false" but that's hard because all these tests are making different ApolloServer classes? Not sure where to put the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can't say I have a good suggestion here, only food for thought. I don't think this comment is a blocker for lack of a proposed solution on my end.
itself. Set this to false to disable. You can manually invoke 'stop()' and | ||
'sendReport()' on other signals if you'd like. Note that 'sendReport()' | ||
does not run synchronously so it cannot work usefully in an 'exit' handler. | ||
For backwards compatibility only; specifying `new ApolloServer({engine: {handleSignals: false}})` is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a deprecation note similar to the ones immediately above? Most importantly, I'd just like to see the actual word "deprecated" here somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do (next week)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'd rather this not merge until #4453 merges too, at which point this entire EngineReportingOptions section will be deprecated.
this.toDispose.add(async () => { | ||
await Promise.all( | ||
serverListeners.map(({ serverWillStop }) => serverWillStop?.()), | ||
); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the additional async/await
here is just extraneous (though harmless)
this.toDispose.add(async () => { | |
await Promise.all( | |
serverListeners.map(({ serverWillStop }) => serverWillStop?.()), | |
); | |
}); | |
this.toDispose.add(() => { | |
return Promise.all( | |
serverListeners.map(({ serverWillStop }) => serverWillStop?.()), | |
); | |
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think harmless is probably a true assessment, particularly since this is not a hot-spot in the code from a performance standpoint, but I think it's worth noting that it does create an additional Promise which needs to be resolved and await always yields to the event loop so it would be processed on the next tick. (Might even add an entry to the stack?) If we were doing it often, there might be memory implications. While some runtimes might (eventually) optimize it out via an optimization known as tail call optimization, that is not an optimization that exists in V8 today and it may never land in many runtimes (See link).
Since this is a shutdown mechanism, we might actually be counting down the ticks until we can terminate, though I don't think anything here would be anything more than a microtask so I don't believe we're actually putting anything on the next full turn of the event loop.
I might say that returning the Promise directly is probably preferred. However, to be very clear, this is just me trying to shed light on Promise execution dynamics, not me asking for a change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion here doesn't typecheck because toDispose is supposed to return ValueOrPromise<void>
but when I return Promise.all it ends up with ValueOrPromise<(void | undefined)[]>
. (The code is semantically different as I had not written return await
.) I guess we could change the typing on toDispose, or I could throw in a .then(() => {})
or something, but the async/await version seems clearer to me than either of those, and this is not performance-critical code. Let me know if you disagree!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. .then
would work, but not worth changing. Fine with me!
logger: logger || console, | ||
schema, | ||
schemaHash, | ||
engine: {}, | ||
}); | ||
if (maybeListener && maybeListener.serverWillStop) { | ||
serverListener = maybeListener; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but what are your thoughts of calling maybeListener
to pluginInstanceListener
? That seems a bit less ambiguous to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to maybeServerListener which matches the type better
@@ -25,6 +25,35 @@ describe('apollo-server', () => { | |||
expect(() => new ApolloServer({ typeDefs, mocks: true })).not.toThrow; | |||
}); | |||
|
|||
it('runs serverWillStart and serverWillStop', async () => { | |||
const fn = jest.fn(); | |||
const oneTick = () => new Promise((res) => process.nextTick(() => res())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also shorten this slightly by doing this:
const oneTick = () => new Promise((res) => process.nextTick(res));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, had a couple of minor non-blocking comments
Is this something you need help digging into? If so, do you have steps to reproduce? |
I'm not really fully at my computer til Monday, but I was running (via
and it appeared to kill the whole jest situation. Can you reproduce that or should I try harder? |
I think I'm reproducing what you're seeing. I added a test to one of the suites (/packages/apollo-server/src/tests/index.test.ts in my case), which is as follows:
It's causing this individual test suite to fail, with an error of "Call retries were exceeded", while all other test suites pass (106 pass). I'm assuming you're wanting to add a one-time listener to SIGINT to introduce assertions within the associated handler for various test scenarios. Hopefully, I've reproduced the same issue you're encountering and made the correct assumption about how you're trying to test. I'll continue digging into this a bit tonight, but if I don't figure it out shortly I'll probably need to pick this up tomorrow after work due to it getting late (I'm on the east coast). Edit: Assuming I'm on the same page as you, I made a little progress. It seems this could be accomplished by mocking
|
Interesting, I wasn't seeing any error, just an exiting test. Lemme try to reproduce more realistically. Yeah, we can mock out |
This is what I ran. Note that exit code 130 is SIGINT (128 + 2). console.log isn't working because jest buffers it, so I logged to a temp file. This is on a Mac.
|
OK, the jest issue occurs with just plain jest, not even ts-jest.
|
Thanks for the example and logging the issue with jest. I am receiving the same results as you based on your gist within the issue you filed. I'm just running on Ubuntu 18.04 and using Node 14.7.0. I did try with Node 12.18.3 (LTS) and received the same results as well, with your example and my initial test too. I agree that it would be ideal to test with actual signals rather mock them out, so hopefully we can get to the bottom of what is going on with jest. I suppose at least we have a fallback in the interim. |
if ( | ||
handleSignals === true || | ||
(typeof this.config.engine === 'object' && | ||
this.config.engine.handleSignals === true) || | ||
(process.env.NODE_ENV !== 'test' && | ||
handleSignals !== false && | ||
(typeof this.config.engine !== 'object' || | ||
this.config.engine.handleSignals !== false)) | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's guard for the existence of process
at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remind me what the best syntax for doing this is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the use case this would not be present? It seems this should be present due to this being a global object of node, unless there is a name collision within the file.
As for syntax, could do something along the lines of:
if (process && process.env.NODE_ENV...
or (event more safe)
if (process instanceof EventEmitter && process.env.NODE_ENV...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the concern is working towards making as much of Apollo Server as possible not dependent on Node-specific APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Makes sense, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to go with isNodeLike
here. (If that's not good enough for your environment you can always specify the option explicitly.)
// Node v10 so we can't use that feature here. | ||
const handler: NodeJS.SignalsListener = async () => { | ||
await this.stop(); | ||
process.kill(process.pid, signal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this originated in the code from before, but I don't understand why we need send the same signal
to ourselves in response to having received that signal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The theory is that we are not trying to prevent the process from dying when asked to, but just let it do some work before dying as requested.
It's not really a primitive that composes well, but I don't know of a better alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that still happen if we didn't call process.kill
? I'm fine leaving it, it just seemed cyclical (though not in a way that would prevent the server from shutting down.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be confused, but I'm pretty sure that handling a signal means that the signal is handled and the default behavior of the process exiting doesn't occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I didn't actually believe that to be the case, but I haven't double clicked on that idea in a while.
this.toDispose.add(async () => { | ||
await Promise.all( | ||
serverListeners.map(({ serverWillStop }) => serverWillStop?.()), | ||
); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think harmless is probably a true assessment, particularly since this is not a hot-spot in the code from a performance standpoint, but I think it's worth noting that it does create an additional Promise which needs to be resolved and await always yields to the event loop so it would be processed on the next tick. (Might even add an entry to the stack?) If we were doing it often, there might be memory implications. While some runtimes might (eventually) optimize it out via an optimization known as tail call optimization, that is not an optimization that exists in V8 today and it may never land in many runtimes (See link).
Since this is a shutdown mechanism, we might actually be counting down the ticks until we can terminate, though I don't think anything here would be anything more than a microtask so I don't believe we're actually putting anything on the next full turn of the event loop.
I might say that returning the Promise directly is probably preferred. However, to be very clear, this is just me trying to shed light on Promise execution dynamics, not me asking for a change.
@@ -124,6 +124,7 @@ export interface Config extends BaseConfig { | |||
playground?: PlaygroundConfig; | |||
gateway?: GraphQLService; | |||
experimental_approximateDocumentStoreMiB?: number; | |||
handleSignals?: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping in mind all the other signals that we do not handle, I don't know that this name is clear enough at the top-level. It's really more like "handle TERM and INT on Node.js by calling AS's stop method".
Is it critical to ship the signal trapping as part of this implementation?
There was a very real reason why I have mentioned trying to avoid taking an opinionated stance as to what constitutes a shutdown for Apollo Server and self-instrumenting that since process.on
is a very Node.js-specific runtime concern. Further, SIGTERM
and SIGINT
both have default handlers in Node.js that I believe disappear when we provide our own.
Thus far, I've been fine with apollo-engine-reporting
using process
-things and performing this shutdown since it also uses other Node.js specific bits (like process.hrtime
) which render it unusable on non-Node.js runtimes. But hoisting this makes me want to think about it a bit more. For example, I'm not sure what happens on Windows, and Deno — if we ever can ever support it — uses a different mechanism (though I believe we could feature check for its process
-equivalent.
I'm not sure we really know what other existing configuration or races against process.on
handlers we're putting ourselves in the queue against. I believe those other mechanisms can short-circuit or trap the server's actual shutdown which might leave us running without the polling?
In my mind, to avoid these concerns, it seems plausible that it's better to leave it in the hands of the user and guide/document them accordingly. In Apollo Server 3.x's single package model, we still need to consider what the server lifetime hooks are. For example, maybe there's a different approach where we don't have a server lifetime, but rather use setTimeout
(not setInterval
!) to schedule work that is intentionally keeping the event loop alive, but not create agent-like-things that need to be stopped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think that stripping a final semi-reliable flush from reporting will have a negative impact on the quality of using Studio, and that requiring it to be 100% opt-in will have a negative impact too. This is going to affect any plugin that's batching information to be sent to any sort of monitoring service, so I think any AS3 plan has to have some solution for this use case.
We've had the AER signal handling on by default for over two years; I'm curious if you've heard any user complaints? I do agree that it doesn't feel bulletproof, but if we haven't had any complaints yet, then it feels "good enough for AS2" to me.
Certainly having it only default to true in Node-like contexts makes sense.
What if the default is "true if Node and Unix (how do I detect this?) and some plugin implements serverWillStop"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I certainly can agree re naming. stopOnSIGINTAndSIGTERM
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about handleTerminationSignals
, handleTermSignals
, stopOnTerminationSignals
, or stopOnTermSignals
? Not sure if this is too ambiguous or overly verbose. I was just thinking of trying to get the conjunction (e.g. "And") out of the variable name.
I drew inspiration from: https://www.gnu.org/software/libc/manual/html_node/Termination-Signals.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like stopOnTerminationSignals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel great about my suggestion of "register signal handler if plugins are looking for it", because the gateway is not a plugin and I do want it to be reliably stopped when I implement #4428. I'm going to stick with "any Node-like non-test environment defaults to handling signals".
I'll note that I see zero references to handleSignals
in issues on this repo or in our internal Slack which makes me believe that the current default behavior has not been a problem enough for users that we have ended up recommending "oh don't worry just set handleSignals: false
" to anyone ever. Similarly looking at GitHub code search (TypeScript, JS) doesn't show any public repos where people are setting this.
While the proposal certainly does broaden the contexts where the signals are handled, it seems like it hasn't been a problem so far. It also feels like a pretty reversible decision: if it causes a problem, then (a) there's the trivial workaround of setting stopOnTerminationSignals: false
and (b) we could perhaps adjust the heuristic to be less aggressive about defaulting to true in a future release. However, my suspicion based on our two years of AER signal handling is that it will not be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree we can reverse if it results in a bug. Worth noting, however, that engine
has just needed to be disabled in all Node.js environments though because of its reliance on process
. It is disabled by default, so this path is, I think, highly more likely to get exercised now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean non-Node.js environments? I'm still not sure I follow, in that the current version of the PR checks isNodeLike
before defaulting to signal handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I did mean non-Node.js! I just mean that any bugs around handleSignals
would only be present for those who have enabled Engine/GM/Studio (which I don't think is a majority of users).
3d61611
to
67dec32
Compare
// Node v10 so we can't use that feature here. | ||
const handler: NodeJS.SignalsListener = async () => { | ||
await this.stop(); | ||
process.kill(process.pid, signal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that still happen if we didn't call process.kill
? I'm fine leaving it, it just seemed cyclical (though not in a way that would prevent the server from shutting down.)
this.toDispose.add(async () => { | ||
await Promise.all( | ||
serverListeners.map(({ serverWillStop }) => serverWillStop?.()), | ||
); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. .then
would work, but not worth changing. Fine with me!
requestDidStart?( | ||
requestContext: GraphQLRequestContext<TContext>, | ||
): GraphQLRequestListener<TContext> | void; | ||
} | ||
|
||
export interface GraphQLServerListener { | ||
serverWillStop?(): ValueOrPromise<void>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, might a serverDidStart
go here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean requestDidStart
?
In fact, as you predicted, having requestDidStart not be here is a bit awkward — so far in #4453 the usage reporting plugin is literally defining a function inside the scope of serverWillStart and assigning it to a let
that is called by requestDidStart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you mean
requestDidStart
?
I meant serverDidStart
, but that's based on there being some sort of phase between the two which certainly necessitates a bigger conversation to introduce. (Also a consideration of the actual life-time of this life-cycle.)
In fact, as you predicted, having requestDidStart not be here is a bit awkward — so far in #4453 the usage reporting plugin is literally defining a function inside the scope of serverWillStart and assigning it to a
let
that is called by requestDidStart.
Yeah, semi-related I guess I am still imaging in the future we might re-consider whether there is a server life-cycle? I'm not sure what to do about that now without supporting both modes (and one being a shorthand for the other)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant serverDidStart, but that's based on there being some sort of phase between the two which certainly necessitates a bigger conversation to introduce. (Also a consideration of the actual life-time of this life-cycle.)
I'm confused — this object is the thing returned from serverWillStart
?
await server.stop(); | ||
expect(fn.mock.calls).toEqual([['a'], ['b'], ['c'], ['d']]); | ||
}); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, CHANGELOG.md would be great, and an either an addition to https://www.apollographql.com/docs/apollo-server/integrations/plugins/ as well (or an issue to track the need to add it with label: docs
)
Fixes #4273. This PR adds a serverWillStop plugin lifecycle hook. The `serverWillStop` hook is on an object optionally returned from a `serverWillStart` hook, similar to `executionDidStart`/`executionDidEnd`. ApolloServerPluginOperationRegistry uses this to stop its agent. The code that installs SIGINT and SIGTERM handlers unless disabled with `handleSignals: false` is hoisted from EngineReportingAgent to ApolloServer itself and renamed to `stopOnTerminationSignals` as a new ApolloServer option. The new implementation also skips installing the signals handlers by default if NODE_ENV=test or if you don't appear to be running in Node (and we update some tests that explicitly set other NODE_ENVs to set handleSignals: false). The main effect on existing code is that on one of these signals, any SubscriptionServer and ApolloGateway will be stopped in addition to any EngineReportingAgent.
67dec32
to
dc5b8f5
Compare
I'm going to close this PR and allow it to merge as part of #4453. |
Fixes #4273.
This PR adds a serverWillStop plugin lifecycle hook. The
serverWillStop
hookis on an object optionally returned from a
serverWillStart
hook, similar toexecutionDidStart
/executionDidEnd
.ApolloServerPluginOperationRegistry uses this to stop its agent.
The code that installs SIGINT and SIGTERM handlers unless disabled with
handleSignals: false
is hoisted from EngineReportingAgent to ApolloServeritself and renamed to
stopOnTerminationSignals
as a new ApolloServeroption. The new implementation also skips installing the signals handlers by
default if NODE_ENV=test or if you don't appear to be running in Node (and we
update some tests that explicitly set other NODE_ENVs to set handleSignals:
false).
The main effect on existing code is that on one of these signals, any
SubscriptionServer and ApolloGateway will be stopped in addition to any
EngineReportingAgent.