-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uncaught exception in socket code: "Cannot read properties of null (reading 'finishWrite')" #46094
Comments
I have now also managed to reproduce this error on v19.3.0 (on the Mac). In addition to the error noted in the original report, my repro-case-in-progress also manages to get a second error to be thrown (moments [CORRECTION]
|
Hi again! I'm afraid I still don't have a succinct repro, but I can tell you a bit more about what's going on. In my codebase, I'm wrapping the raw socket that gets accepted from a When the session times out, (per original report) I call After the call to I have an "investigatory workaround" which "fixes" the problem by delaying my code acting on the |
I figured out what's going on in my code, and though I still think there's a Node bug worth addressing, I am no longer triggering it: I made a mistake with my log-based instrumentation (from the previous comment), and I was wrong about What I was right about was that HTTP2 tacitly expects The solution in my code was (of course?) to specify I checked the Node docs for the Thank you for going on this journey with me, and wishing you all a happy new year. |
Maybe this is the same bug, or at least related?: #35695 That one is closed, but it sure looks similar. |
BTW, I just ran into another way to get this error:
|
This PR reworks the last couple days' worth of effort to do something useful when connected sockets time out. Notably, during the course of this work, I ran into another case of the `null...finishWrite` exception. Existing Node bugs (the latter was filed by me): * <nodejs/node#35695> * <nodejs/node#46094>
node does not allow manipulation againt https://nodejs.org/docs/latest-v14.x/api/http2.html#http2_http2session_socket |
I'm accepting sockets in my code directly, wrapping them (to do rate limiting, timeout, and logging at the low level), then handing the wrapped sockets off to an
|
How did you
could you point out which lines of code is responsible for this? (I glanced your code but couldn't find out) |
Actually, I have managed to do the same thing but in a different way.
|
@ywave620 Not sure if you still want an answer, but just in case, here's (I think) what you're looking for, with a bit of extra context:
(NB, I've been working on this area of the code lately, so the line numbers are off from my links above. I'll link to a tag here instead of
|
That's indeed what I'm doing (though along with other stuff too). |
@danfuzz are you still seeing this issue? I have a fix for a closely related issue that I think likely resolves this too. I can't reliably reproduce the If you can still reproduce this, can you try rebuilding node with the changes from #49327, and then see if that fixes it for you? |
@pimterry I have not seen the issue since that comment you referenced, because I adjusted my timeouts (etc.) to avoid the situation. I'm afraid my development workflow doesn't involve (re)building Node, so it'd take more time than I have available right now to do that in a timely fashion for you. On the flip-side, the project in which I ran into the problem — https://github.com/danfuzz/lactoserv — is open source and builds pretty quickly. The
The un-fix for the other case (the one you reference) isn't too bad either. The two timeouts in question are In either case, if you build and run using the default (development) config, then just use a browser (or at least something that does lasting HTTP2 sessions, e.g. not Build / run instructions are at https://github.com/danfuzz/lactoserv/blob/main/doc/development.md. If you decide to do this and run into any trouble, please do not hesitate to reach out. |
I've tried to do that @danfuzz, but it doesn't reproduce for me. Diff: diff --git a/src/network-protocol/private/AsyncServer.js b/src/network-protocol/private/AsyncServer.js
index 5d76748e..496f706b 100644
--- a/src/network-protocol/private/AsyncServer.js
+++ b/src/network-protocol/private/AsyncServer.js
@@ -275,7 +275,7 @@ export class AsyncServer {
* `ProtocolWrangler` class doc for details.
*/
static #CREATE_PROTO = Object.freeze({
- allowHalfOpen: { default: true },
+ allowHalfOpen: { default: false },
keepAlive: null,
keepAliveInitialDelay: null,
noDelay: null,
diff --git a/src/network-protocol/private/Http2Wrangler.js b/src/network-protocol/private/Http2Wrangler.js
index 1118d47e..49c2508c 100644
--- a/src/network-protocol/private/Http2Wrangler.js
+++ b/src/network-protocol/private/Http2Wrangler.js
@@ -262,5 +262,5 @@ export class Http2Wrangler extends TcpWrangler {
* @type {number} How long in msec to wait for a session to have activity
* before considering it "timed out" and telling it to close.
*/
- static #SESSION_TIMEOUT_MSEC = 1 * 60 * 1000; // One minute.
+ static #SESSION_TIMEOUT_MSEC = 10 * 1000; // Ten seconds.
}
diff --git a/src/network-protocol/private/TcpWrangler.js b/src/network-protocol/private/TcpWrangler.js
index 03d47d0a..e24ea5c2 100644
--- a/src/network-protocol/private/TcpWrangler.js
+++ b/src/network-protocol/private/TcpWrangler.js
@@ -253,7 +253,7 @@ export class TcpWrangler extends ProtocolWrangler {
* socket (a/o/t a server socket doing a `listen()`) to be "timed out." When
* timed out, a socket is closed proactively.
*/
- static #SOCKET_TIMEOUT_MSEC = 3 * 60 * 1000; // Three minutes.
+ static #SOCKET_TIMEOUT_MSEC = 3 * 1000; // Three seconds.
/**
* @type {number} Grace period in msec after trying to close a socket due to Testing with both Node v18.17.1 and v20.5.1, running with Cumulatively between #49327 and #49400 I'm fairly confident I've fixed a whole class of issues that should cover this case, so it'd be great to be able to confirm that here if you can help at all. |
@pimterry Sorry for the trouble. It's been a while since my head's been in this code; I thought what I suggested would work for you, but clearly I was wrong! But I think the news is at least good-ish: After trying a few things I figured out that I could induce the problem once your diffs (above) were applied, including the error I mentioned in #46094 (comment), by using This was on Node v20.0.0, as installed on macOS via
|
Version
v18.4.0
Platform
Linux i-00f751a18edbe3eb6.us-west-2.compute.internal 5.15.73-45.135.amzn2022.x86_64 #1 SMP Fri Oct 14 17:47:15 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
http2 (maybe actually net or tls)
What steps will reproduce the bug?
Apologies that I don't have an easy repro case yet. It's one of those "run my server for a while and then it does this." Near as I can tell, it's along these lines:
Http2Server
(via TLS socket).close()
on the session. (Tick/turn concludes.)TypeError: Cannot read properties of null (reading 'finishWrite')
When I get a chance (hopefully soon) I will try to distill down a real example. Thanks for any help you can offer in the meantime.
How often does it reproduce? Is there a required condition?
Unclear. This shows up about once a day on my not-very-active server. It does correspond one-to-one with a call to
session.close()
. That is, every time my server does this right now, this uncaught exception seems to follow.What is the expected behavior?
No uncaught exception. That is, either no exception at all, or an exception that can be safely caught and handled without shutting the system down.
What do you see instead?
This uncaught exception:
Additional information
This is the actual code which sets up the timeout. The timeout time is 5 minutes, and the
logger...
call simply writes a structured log line.Http2Session.close()
does accept an optional callback, which is documented as being bound to theclose
event. However, this code already binds a handler toclose
(which does not seem to be getting called before the uncaught exception is generated).The text was updated successfully, but these errors were encountered: