Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash due to "mismatched follow" #472

Closed
riseuplabs opened this issue Feb 20, 2012 · 27 comments
Closed

crash due to "mismatched follow" #472

riseuplabs opened this issue Feb 20, 2012 · 27 comments

Comments

@riseuplabs
Copy link

Crash happened, causing the server to restart (and clients to be disconnected), stack trace:

[2012-02-20 09:48:23.623] [ERROR] console - Error: exports: mismatched follow
at Object.error (/Changeset.js:34:11)
at Object.assert (/Changeset.js:41:13)
at Object.follow (/Changeset.js:1797:11)
at Object.callback (/srv/etherpad-lite/node/handler/PadMessageHandler.js:431:35)
at /srv/etherpad-lite/node_modules/ueberDB/CloneAndAtomicLayer.js:120:17
at /srv/etherpad-lite/node_modules/ueberDB/CacheAndBufferLayer.js:345:7
at [object Object].get (/srv/etherpad-lite/node_modules/ueberDB/CacheAndBufferLayer.js:142:5)
at [object Object].getSub (/srv/etherpad-lite/node_modules/ueberDB/CacheAndBufferLayer.js:317:8)
at [object Object].doOperation as operatorFunction
at [object Object].emit (/srv/etherpad-lite/node_modules/ueberDB/node_modules/channels/channels.js:38:11)
RESTART!

@jrabier
Copy link

jrabier commented Apr 2, 2012

I can confirm this issue, with a similar stack trace.
I'm using node-v0.6.9 and revision dfa106d for etherpad-lite.

@mrneil
Copy link
Contributor

mrneil commented Jun 1, 2012

Like taziden, I had the same error and a similar stack trace this afternoon. I'm running etherpad-lite commit 309e3b0 on node v0.6.11.

Do either of you have logs from the crashes you experienced? The baserev property of the USER_CHANGES messages sent from the users normally increase by one each time. In this case, the baserev of the USER_CHANGES message immediately proceeding the crash went down by one.

By itself, I think that's okay. It just means the change is based on a previous revision (not the latest). But I also see a client-side JS error logged less than 10 seconds earlier. It's a shot in the dark, but logs in other cases where this error occurred may help us determine if there is any relation.

I still have to do some reading up on changesets.

@mrneil
Copy link
Contributor

mrneil commented Jun 17, 2012

I replaced PadMessageHandler.js line 431 (in the commit I'm running)

changeset = Changeset.follow(c, changeset, false, apool);

with

try
{
  changeset = Changeset.follow(c, changeset, false, apool);
}
catch(e)
{
  console.warn("Can't apply USER_CHANGES "+changeset+", possibly because of mismatched follow error");
  client.json.send({disconnect:"badChangeset"});
  return;
}

Explanation: Changeset.follow() runs assert() which throws an error if things don't match properly. The try-catch block disconnects the user submitting the mismatch. There's a very similar block about 30 lines above, which is what I based this on.

Feedback is appreciated.

@mrneil
Copy link
Contributor

mrneil commented Jun 18, 2012

EDIT - I would ignore this comment as it may have been premature. The issue here seems to be unrelated to the mismatched follow error and the code in my previous comment appears to solve that problem on its own. At the very least, further exploration is required.

Using the test from issue #752, my code catches the mismatched follow error and prevents a crash. However, any user who connects after that will cause a crash ("Cannot convert null to object").

To remedy the new error, I surrounded PadMessageHandler.js lines 713 and 714 with an if statement.

if(author){
delete author.timestamp;
historicalAuthorData[authorId] = author;
}

It works, but I have absolutely no idea what impact it has overall. Run it at your own risk.

And if you can shed any light on the validity of the code, I'm interested in hearing from you.

@JohnMcLear
Copy link
Member

Did we ever see a pull request with this fix in?

@mrneil
Copy link
Contributor

mrneil commented Jul 5, 2012

No, @johnyma22. I have not created a pull request for two reasons: (a) my lack of familiarity with git and pull requests, and (b) potential impact on other moving parts of Etherpad Lite.

If you or others with experience evaluate the code and determine it to be valid, I have no problem with it being added to the project.

@iamer
Copy link

iamer commented Aug 16, 2012

I started seeing this crash frequently with about 10 users, some using older browsers. The two code changes above abolished the crashes. Thank you very much !

A minor side effect is authors sometimes disappear for a short interval and then reappear again.

@mrneil
Copy link
Contributor

mrneil commented Aug 19, 2012

Hi @iamer,

I'm glad to know the code works for you! You said you applied two changes, but as I wrote above, you should ignore the second. (I've actually just edited the comment to apply strikethrough text so no one else gets confused.)

The correct code has been running for me without issue for quite a while. We're not seeing authors disappear and reappear, so I wonder what happens if you undo the code change that is (now) crossed out.

@iamer
Copy link

iamer commented Aug 19, 2012

I've reverted the spurious change, will keep an eye on the logs and report back. Thanks for your reply.

@JohnMcLear
Copy link
Member

I have added a task for myself to review this and issue a pull request after testing.

@iamer
Copy link

iamer commented Sep 11, 2012

Now with 30 users I am getting a similar crash, but not sure if it is the same.

[2012-09-11 14:44:03.357] [WARN] socket.io - client not handshaken 'client should reconnect'
[2012-09-11 14:44:11.095] [WARN] socket.io - handshake error Bad session / session has expired
[2012-09-11 14:44:11.787] [WARN] socket.io - handshake error Bad session / session has expired
[2012-09-11 14:44:13.568] [WARN] socket.io - handshake error Bad session / session has expired
[2012-09-11 14:44:18.313] [WARN] socket.io - handshake error Bad session / session has expired
[2012-09-11 14:44:47.818] [ERROR] console - CLIENT SIDE JAVASCRIPT ERROR: {"errorId":"oOnoX0idxIBcIAGjvvgn","msg":"NS_ERROR_FAILURE: Failure","url":"XXX","linenumber":1,"userAgent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1"}
[2012-09-11 14:44:50.124] [ERROR] console - CLIENT SIDE JAVASCRIPT ERROR: {"errorId":"chOMKbQzwvFEvHp4ZVXv","msg":"NS_ERROR_FAILURE: Failure","url":"XXX","linenumber":1,"userAgent":"Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0"}
[2012-09-11 14:45:06.686] [ERROR] console - CLIENT SIDE JAVASCRIPT ERROR: {"errorId":"BxW92nkuWWtJ0xt1nm2l","msg":"a is null","url":"XXX","linenumber":18,"userAgent":"Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:11.0) Gecko/20100101 Firefox/11.0"}
[2012-09-11 14:45:27.397] [ERROR] console - CLIENT SIDE JAVASCRIPT ERROR: {"errorId":"vpKOzVsuR90JzxuBseql","msg":"TypeError: a is null","url":"XXX","linenumber":18,"userAgent":"Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0"}
[2012-09-11 14:45:34.906] [ERROR] console - CLIENT SIDE JAVASCRIPT ERROR: {"errorId":"Tj9B84xt6Hv1f4nTWZjB","msg":"TypeError: a is null","url":"XXX","linenumber":18,"userAgent":"Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0"}
[2012-09-11 14:45:49.262] [WARN] console - Can't apply USER_CHANGES Z:du8>2|5=1v_e=1|1=1f_f=1|1=17_g=1|1=2c_j=1=1j_a|1+1_a_5_h_7_k+1$
*, possibly because of mismatched follow error
[2012-09-11 14:46:08.222] [ERROR] console - [RangeError: Maximum call stack size exceeded]
Ensure that all dependencies are up to date...
Ensure jQuery is downloaded and up to date...
Clear minfified cache...
ensure custom css/js files are created...
start...

@mrneil
Copy link
Contributor

mrneil commented Sep 11, 2012

Just a quick reply for now....

  • I see the client not handshaken 'client should reconnect' error on a very regular basis. In my case, I don't seem to get them as often (or sometimes at all) when the user that has poor connectivity is not using EPL that day.
  • I can't ever recall seeing the handshake error Bad session / session has expired error you list. Are you doing separate authentication using groups or sessions?
  • The NS_ERROR_FAILURE is also new to me.
  • I regularly receive the TypeError: a is null message. For me, it usually happens on line 57 though. I haven't determined what exactly is on that line so I don't know what's causing it, although it also seems to occur more frequently with poor connectivity.
  • Your mismatched follow error means that it ran the code above and caught the error. Good!
  • Finally, the Maximum call stack size exceeded seems eerily similar to issue Reconnect causes error "RangeError: Maximum call stack size exceeded" - detailed #800. But in that case, immediately before the error, I see several hundred NEW_CHANGES messages being sent to a single user as EPL builds and pushes the changesets to bring that user back to the same revision as the others. Unless you've removed a bunch of log entries, it doesn't appear to be the same cause. But I wonder if they are somehow connected.

Have you tried turning DEBUG level logging on? Perhaps it will isolate what is leading to the RangeError.

@iamer
Copy link

iamer commented Sep 11, 2012

  • I am using external LDAP auth through basic HTTP auth but that's not related to EPL
  • I haven't removed any log entries, but logging is not at DEBUG level ( too copious to handle on production with 30+ users hammering EPL)

I will try to write a testing script that replays HTTP traffic and reproduce the bug with debugging turned on.

@marcelklehr
Copy link
Contributor

@amtep I think this was fixed in #1023, wasn't it?

@amtep
Copy link
Contributor

amtep commented Oct 14, 2012

We haven't seen the mismatched follow error since applying the pull request from #1023, but I think the underlying problem hasn't been fixed. The "mismatched follow" would mean that the clients were sending bad changesets and we haven't found an explanation for that.

@marcelklehr marcelklehr mentioned this issue Nov 9, 2012
@joker-x
Copy link
Contributor

joker-x commented Nov 27, 2012

@mrneil : Your patch work... if your problem is "how to do a pull request with git", i gladly help you :)

@JohnMcLear
Copy link
Member

./etherpad-stresstest --clients=lurk:100,write:10 http://server:9001/p/foo

Will crash Etherpad after a while. We now have a method to reproduce so should be good to debug.

[2013-02-12 16:35:25.361] [ERROR] console - Error: exports: mismatched follow
    at Object.error (/var/www/node/etherpad-lite-dev/etherpad-lite/src/static/js/Changeset.js:39:11)
    at Object.assert (/var/www/node/etherpad-lite-dev/etherpad-lite/src/static/js/Changeset.js:53:13)
    at Object.exports.follow (/var/www/node/etherpad-lite-dev/etherpad-lite/src/static/js/Changeset.js:2013:11)
    at Object.async.series.client.json.send.disconnect [as callback] (/var/www/node/etherpad-lite-dev/etherpad-lite/src/node/handler/PadMessageHandler.js:589:35)
    at /var/www/node/etherpad-lite-dev/etherpad-lite/src/node_modules/ueberDB/CloneAndAtomicLayer.js:139:17
    at exports.database.getSub (/var/www/node/etherpad-lite-dev/etherpad-lite/src/node_modules/ueberDB/CacheAndBufferLayer.js:378:7)
    at exports.database.get (/var/www/node/etherpad-lite-dev/etherpad-lite/src/node_modules/ueberDB/CacheAndBufferLayer.js:157:5)
    at exports.database.getSub (/var/www/node/etherpad-lite-dev/etherpad-lite/src/node_modules/ueberDB/CacheAndBufferLayer.js:350:8)
    at doOperation [as operatorFunction] (/var/www/node/etherpad-lite-dev/etherpad-lite/src/node_modules/ueberDB/CloneAndAtomicLayer.js:133:18)
    at exports.channels.emit (/var/www/node/etherpad-lite-dev/etherpad-lite/src/node_modules/ueberDB/node_modules/channels/channels.js:38:11)

@ghost ghost assigned Pita Feb 13, 2013
@amtep
Copy link
Contributor

amtep commented Feb 19, 2013

Cool!

This could be either very interesting or very boring :) If it's a bug in the stresstest client, then it doesn't help us much. But if the clients are sending correct changesets but the server somehow munges them then we really have the root cause.

Either way, though, the server shouldn't crash no matter what the clients send. So some version of mrneil's patch is needed.

@marcelklehr
Copy link
Contributor

So, this is @mrneil's approach to fix the problem:
#472 (comment)

@rasos
Copy link
Contributor

rasos commented Feb 26, 2013

The above fix does not help here (latest version 7a0ad32, applied in PadMessageHandler.js around line 530).

After server start, a pad content is delivered, then we have in the log file:
ESC[36m[2013-02-26 21:24:16.788] [DEBUG] ueberDB - ESC[39mGETSUB - globalAuthor:a.WZAfeDGHYipnTU0O["name"] - "Talie"
ESC[36m[2013-02-26 21:24:16.788] [DEBUG] ueberDB - ESC[39mGET - globalAuthor:a.2B7OTb1sMz4paEEn - {"colorId":"#55c4d9","name":"Brad","timestamp":1361821403366,"padIDs"
:{"weeklongwarriors":1}} - from cache
ESC[32m[2013-02-26 21:24:16.788] [INFO] message - ESC[39mto iBoQF9yURV8-2OVyLU0m: {"disconnect":"userdup"}
ESC[32m[2013-02-26 21:24:16.788] [INFO] message - ESC[39mto iBoQF9yURV8-2OVyLU0m: {"disconnect":"userdup"}
ESC[36m[2013-02-26 21:24:16.788] [DEBUG] ueberDB - ESC[39mGETSUB - globalAuthor:a.WZAfeDGHYipnTU0O["name"] - "Talie"
ESC[36m[2013-02-26 21:24:16.788] [DEBUG] ueberDB - ESC[39mGET - globalAuthor:a.2B7OTb1sMz4paEEn - {"colorId":"#55c4d9","name":"Brad","timestamp":1361821403366,"padIDs"
:{"weeklongwarriors":1}} - from cache
ESC[32m[2013-02-26 21:24:16.788] [INFO] message - ESC[39mto iBoQF9yURV8-2OVyLU0m: {"disconnect":"userdup"}
ESC[32m[2013-02-26 21:24:16.788] [INFO] message - ESC[39mto iBoQF9yURV8-2OVyLU0m: {"disconnect":"userdup"}

The last line repeats a thousand times, then:

ESC[32m[2013-02-26 21:24:17.941] [INFO] message - ESC[39mto iBoQF9yURV8-2OVyLU0m: {"disconnect":"userdup"}
ESC[31m[2013-02-26 21:24:17.942] [ERROR] console - ESC[39m[RangeError: Maximum call stack size exceeded]
ESC[32m[2013-02-26 21:24:17.943] [INFO] console - ESC[39mgraceful shutdown...

@JohnMcLear
Copy link
Member

@rasos can you replicate or is it a one off? Also any chance you can clean your paste up?

@rasos
Copy link
Contributor

rasos commented Feb 26, 2013

Error can be replicated, we currently loop:

RESTART!
Ensure that all dependencies are up to date...
npm WARN package.json [email protected] No README.md file found!
Ensure jQuery is downloaded and up to date...
Clear minfified cache...
ensure custom css/js files are created...
start...
[2013-02-26 21:40:59.026] [WARN] socket.io - client not handshaken 'client should reconnect'
[2013-02-26 21:41:00.531] [ERROR] console - [RangeError: Maximum call stack size exceeded]

@JohnMcLear
Copy link
Member

Can you please extract the pad using

node bin/extractPadData.js %padID%

replacing %padID% with the padId of the pad that keeps failing

Once you have the .db file of the pad (it will be in your folder) put it on the web and share the link here so we can investigate further

@rasos
Copy link
Contributor

rasos commented Feb 26, 2013

Mmh, the size of the padID.db file is 0, replacing the padID with the pad name. I can read in the logfile that somebody uses the pad to exchange even passwords, so I probably should not post that in the web. Any way I can send that to you in pm? Or any hint to disactivate that pad?

@JohnMcLear
Copy link
Member

email me it john splat mclear.co.uk
to "disactivate" a pad just delete it using the script I pushed today --> https://github.com/ether/etherpad-lite/pull/1557/files
but make sure you export first, if we don't know what's causing the error we can't fix it.

@mrneil
Copy link
Contributor

mrneil commented Feb 27, 2013

The fix I posted above will not directly solve the issue experienced by @rasos because the log does not indicate it is being caused by a mismatched follow error.

However, the report reminds me of issue #800. That one occurred after several hundred NEW_CHANGES messages were sent, while this latest one is getting hung up on duplicate users. Can the fix described there (#1025) be adapted to solve this problem, too?

bacongobbler pushed a commit to Stackato-Apps/etherpad-lite that referenced this issue Nov 15, 2013
bacongobbler pushed a commit to Stackato-Apps/etherpad-lite that referenced this issue Nov 15, 2013
@JohnMcLear
Copy link
Member

Yea @rasos went off topic, basically I can close this but I'm mindful stress testing may throw some exceptions we need to debug.

muxator added a commit that referenced this issue Nov 25, 2019
This upgrade solves the high-severity vulnerabilities regarding
https-proxy-agent that were still present in 8e6bca4.

The output of `npm audit` goes from this:
  found 29 vulnerabilities (3 low, 26 high) in 13338 scanned packages
    run `npm audit fix` to fix 4 of them.
    1 vulnerability requires semver-major dependency updates.
    24 vulnerabilities require manual review. See the full report for details.

To this:
found 5 vulnerabilities (3 low, 2 high) in 13338 scanned packages
  1 vulnerability requires semver-major dependency updates.
  4 vulnerabilities require manual review. See the full report for details.


Changelog:
- https://github.com/npm/cli/releases

6.13.1 (2019-11-18)
    BUG FIXES
    938d6124d #472 fix(fund): support funding string shorthand (@ruyadorno)
    b49c5535b #471 should not publish tap-snapshot folder (@ruyadorno)
    3471d5200 #253 Add preliminary WSL support for npm and npx (@infinnie)
    3ef295f23 #486 print quick audit report for human output (@isaacs)

    TESTING
    dbbf977ac #278 added workflow to trigger and run benchmarks (@mikemimik)
    b4f5e3825 #457 feat(docs): adding tests and updating docs to reflect changes in registry teams API. (@nomadtechie)
    454c7dd60 #456 fix git configs for git 2.23 and above (@isaacs)

    DEPENDENCIES
    661d86cd2 [email protected] (@claudiahdz)

6.13.0 (2019-11-05)
    NEW FEATURES
    4414b06d9 #273 add fund command (@ruyadorno)

    BUG FIXES
    e4455409f #281 delete ps1 files on package removal (@NoDocCat)
    cd14d4701 #279 update supported node list to remove v6.0, v6.1, v9.0 - v9.2 (@ljharb)

    DEPENDENCIES
    a37296b20 [email protected]
    d3cb3abe8 [email protected]

    TESTING
    688cd97be #272 use github actions for CI (@JasonEtco)
    9a2d8af84 #240 Clean up some flakiness and inconsistency (@isaacs)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants