Fix race condition and a stack error caused by too old changesets#1025
Fix race condition and a stack error caused by too old changesets#1025marcelklehr merged 3 commits intoether:developfrom
Conversation
When stress testing etherpad-lite we occasionally got this error:
TypeError: Cannot read property 'author' of undefined
at /home/etherpad/etherpad-lite/src/node/handler/PadMessageHandler.js:556:47
handleUserChanges was accessing sessioninfos[client.id].author in a callback,
after spending some time in the loop that updates the changeset to the
latest revision. It's possible for a disconnect request to be processed
during that loop so the session might no longer be there.
This patch fixes it by looking up the author at the start of the function.
We had a problem with the server running out of stack space if a client submitted a changeset based on a revision more than about 1000 revs old. (944 was our cutoff but yours may vary). This happened in the wild with about 30 people editing via flaky wifi. A disconnected client would try to submit a fairly old changeset when reconnecting, and a few minutes was enough for 30 people to generate that many revs. The stack kept growing because pad.getRevisionChangeset was being answered from the cache, so no I/O interrupted the callback chain. (This was seen with mysql, I don't know about other backends.) This patch forces a nextTick every 200 revisions to solve this problem.
|
This looks good thanks, I will review ASAP and get back to you :) |
|
Ah it's very short, all looks good to me, @marcelklehr do you want to have any input on the code? I'm happy to pull if you are :) |
|
The stack space fix might be the fix for https://github.com/Pita/etherpad-lite/issues/800 as well |
|
Oh, indeed written by very smart people. Amazing! Could you change this, so only the look-up is moved to the beginning -- so instead of Incidentally, could you do the same over here PadMessageHandler.js#L876 ? (someone had similar issues, but with thank you! |
|
And I agree, that this very probably will fix #800 -- yay! |
|
Hmm yeah, the thisAuthor thing needs a comment :) But I'm not sure if saving the whole session is a good idea. The author is known not to change, but there are other things in the session (such as .rev) where using a saved copy can lead to subtle bugs. That's why I'd rather be conservative about what to save. What do you think? What's the protocol for changing a pull request? Should I rebase on my side, or just add a commit on top? About #1023, that looks like a case where the function just shouldn't proceed if the session is gone. No point in announcing USER_NEWINFO for a user who already left. We can look into patching that. |
|
Just add a commit on top :) Mmh. It looks like handleUserChanges just makes use of So.. if you are seriously concerned about this, don't change it. - |
|
Oh, yeah, I didn't think of that. The saved session is still a shared reference. Too much Qt on my mind :) |
Also add a comment to explain what's going on with thisSession. No changes in behavior.
|
I'd like to deal with #1023 in a separate pull request, so that it doesn't block these fixes. We want to give it the full treatment, starting with extending the client to reproduce that bug :) |
|
Agreed. thank you! :) |
Fix race condition and a stack error caused by too old changesets
Here are two fixes for server stability problems that we encountered when stress testing etherpad-lite.
We found the first problem while looking for the second problem. The second problem was seen during real use.
We developed a dedicated stresstest client in the process :) You can see it at https://bitbucket.org/rbraakman/etherpad-stresstest if you're interested. It doesn't implement the whole protocol, we were just using it to try to reproduce known problems.