Skip to content

Fix race condition and a stack error caused by too old changesets#1025

Merged
marcelklehr merged 3 commits intoether:developfrom
amtep:develop
Sep 28, 2012
Merged

Fix race condition and a stack error caused by too old changesets#1025
marcelklehr merged 3 commits intoether:developfrom
amtep:develop

Conversation

@amtep
Copy link
Copy Markdown
Contributor

@amtep amtep commented Sep 26, 2012

Here are two fixes for server stability problems that we encountered when stress testing etherpad-lite.

We found the first problem while looking for the second problem. The second problem was seen during real use.

We developed a dedicated stresstest client in the process :) You can see it at https://bitbucket.org/rbraakman/etherpad-stresstest if you're interested. It doesn't implement the whole protocol, we were just using it to try to reproduce known problems.

When stress testing etherpad-lite we occasionally got this error:

TypeError: Cannot read property 'author' of undefined
    at /home/etherpad/etherpad-lite/src/node/handler/PadMessageHandler.js:556:47

handleUserChanges was accessing sessioninfos[client.id].author in a callback,
after spending some time in the loop that updates the changeset to the
latest revision. It's possible for a disconnect request to be processed
during that loop so the session might no longer be there.

This patch fixes it by looking up the author at the start of the function.
We had a problem with the server running out of stack space if a client
submitted a changeset based on a revision more than about 1000 revs old.
(944 was our cutoff but yours may vary). This happened in the wild with
about 30 people editing via flaky wifi. A disconnected client would try
to submit a fairly old changeset when reconnecting, and a few minutes
was enough for 30 people to generate that many revs.

The stack kept growing because pad.getRevisionChangeset was being answered
from the cache, so no I/O interrupted the callback chain. (This was seen with
mysql, I don't know about other backends.)

This patch forces a nextTick every 200 revisions to solve this problem.
@JohnMcLear
Copy link
Copy Markdown
Member

This looks good thanks, I will review ASAP and get back to you :)

@JohnMcLear
Copy link
Copy Markdown
Member

Ah it's very short, all looks good to me, @marcelklehr do you want to have any input on the code? I'm happy to pull if you are :)

@amtep
Copy link
Copy Markdown
Contributor Author

amtep commented Sep 26, 2012

The stack space fix might be the fix for https://github.com/Pita/etherpad-lite/issues/800 as well

@marcelklehr
Copy link
Copy Markdown
Contributor

Oh, indeed written by very smart people. Amazing!

@amtep,

Could you change this, so only the look-up is moved to the beginning -- so instead of thisAuthor = sessioninfos[client.id].author it would be something like thissession = sessioninfos[client.id] -- probably easier to understand. (and perhaps add a comment to that line, too)

Incidentally, could you do the same over here PadMessageHandler.js#L876 ? (someone had similar issues, but with CLIENT_READY -- see #1023)

thank you!

@marcelklehr
Copy link
Copy Markdown
Contributor

And I agree, that this very probably will fix #800 -- yay!

@amtep
Copy link
Copy Markdown
Contributor Author

amtep commented Sep 26, 2012

Hmm yeah, the thisAuthor thing needs a comment :)

But I'm not sure if saving the whole session is a good idea. The author is known not to change, but there are other things in the session (such as .rev) where using a saved copy can lead to subtle bugs. That's why I'd rather be conservative about what to save. What do you think?

What's the protocol for changing a pull request? Should I rebase on my side, or just add a commit on top?

About #1023, that looks like a case where the function just shouldn't proceed if the session is gone. No point in announcing USER_NEWINFO for a user who already left. We can look into patching that.

@marcelklehr
Copy link
Copy Markdown
Contributor

Just add a commit on top :)

Mmh. It looks like handleUserChanges just makes use of
sessioninfos[client.id].padId and .author, also, as I see it it
wouldn't be a literal copy, but just a reference to the session object.
Even if the session gets deleted in sessioninfos you still have access
to its properties (afaik, the session entries aren't replaced by new
objects, they are just mutated).

So.. if you are seriously concerned about this, don't change it. -
otherwise I'd recommend applying my suggested change...

@amtep
Copy link
Copy Markdown
Contributor Author

amtep commented Sep 26, 2012

Oh, yeah, I didn't think of that. The saved session is still a shared reference. Too much Qt on my mind :)

Also add a comment to explain what's going on with thisSession.
No changes in behavior.
@amtep
Copy link
Copy Markdown
Contributor Author

amtep commented Sep 27, 2012

I'd like to deal with #1023 in a separate pull request, so that it doesn't block these fixes. We want to give it the full treatment, starting with extending the client to reproduce that bug :)

@marcelklehr
Copy link
Copy Markdown
Contributor

Agreed. thank you! :)

marcelklehr added a commit that referenced this pull request Sep 28, 2012
Fix race condition and a stack error caused by too old changesets
@marcelklehr marcelklehr merged commit 3578e36 into ether:develop Sep 28, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants