Skip to content

Fix crash with queued messages#2092

Merged
JohnMcLear merged 2 commits intoether:developfrom
webzwo0i:fix-crash-with-queued-messages
Mar 26, 2014
Merged

Fix crash with queued messages#2092
JohnMcLear merged 2 commits intoether:developfrom
webzwo0i:fix-crash-with-queued-messages

Conversation

@webzwo0i
Copy link
Copy Markdown
Member

if a disconnect msg is received, handleDisconnect is called and deletes the session
if a message is received, handleMessage is called and a lot of callbacks are fired to handle the message.
at the beginning of handleMessage is a check that ensures the session exists.

if the server is under heavy i/o load, callbacks get delayed and it can happen that a disconnect message is handled before all callbacks that depend on the existence of the session.

a similar patch was applied in the past (#1025) but it was not enough. eplite will crash if I use the bug in #2091 to insert a lot of changesets up to the point, where ueberDB's internal cache size is reached. suppose the default size of 1000 is used. at about 1000 revisions, the garbage collector kicks in and deletes half of the cache, oldest entries first. (say revisions 1 upto 500). I submit a new cs and the cache is filled with 500 new entries, and gc starts again and deletes revision 501 upto 1000.
at this point ueberDBs cache is completely unused and every request that is necessary to rebase the cs to the current head will hit the database (as if cache size was 0) and the result is a lot of i/o.

now I submit some changesets with baseRev:0 and a disconnect. some will be queued in a channel and when they are processed the session is gone. for others the check at the beginning of handleMessage will be okay, but when the finalHandler is processed the session is gone.

I think it may be possible to exactly predict all the places where this can happen, but my understanding of nodejs' event system is not good enough for this. I was thinking it can only happen in async-functions, that queue processing with nextTick/setTimeout/setImmediate, so every code path that uses those function must ensure that the session is present.

thats why I left a "TODO" in the patch. At a later point, instead of checking in so many different places if the session (or some object keys in the message etc) are still present it would be easier to check this only at one place (in the beginning of handleMessage), make a copy of the session, and hand this copy over to all callbacks that need it. every message that had a proper session when they arrived will always have it during later processing, no matter how much delay is in between.
what do you think?

@marcelklehr
Copy link
Copy Markdown
Contributor

I think it's a good idea!

@marcelklehr
Copy link
Copy Markdown
Contributor

Would you like to add the proposed changes here, too, or should I pull this already?

@marcelklehr marcelklehr mentioned this pull request Feb 26, 2014
8 tasks
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could store the session in the queue alongside the other stuff, but I think we might run into errors as the session would be invalid, still.

@marcelklehr
Copy link
Copy Markdown
Contributor

Other than that this is good to pull, I'd say.

@JohnMcLear
Copy link
Copy Markdown
Member

@webzwo0i did you want to update before we merge?

@JohnMcLear
Copy link
Copy Markdown
Member

@marcelklehr did you want to see this in 1.4?

@marcelklehr
Copy link
Copy Markdown
Contributor

Yes, this is good to pull into 1.4 imo

JohnMcLear added a commit that referenced this pull request Mar 26, 2014
@JohnMcLear JohnMcLear merged commit 56fd078 into ether:develop Mar 26, 2014
@JohnMcLear
Copy link
Copy Markdown
Member

Congrats @webzwo0i

@webzwo0i webzwo0i deleted the fix-crash-with-queued-messages branch April 18, 2015 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants