Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent but regular TLS failures #2150

Closed
rvagg opened this issue Jul 10, 2015 · 11 comments
Closed

Intermittent but regular TLS failures #2150

rvagg opened this issue Jul 10, 2015 · 11 comments
Labels
test Issues and PRs related to the tests. tls Issues and PRs related to the tls subsystem. windows Issues and PRs related to the Windows platform.

Comments

@rvagg
Copy link
Member

rvagg commented Jul 10, 2015

I haven't dug into this but I'm seeing regular TLS failures on CI now. I can't tell you how regular or what platforms specifically but if you dig through any-pr+multi you should find some examples. Here's one: https://jenkins-iojs.nodesource.com/job/iojs+pr+win/131/nodes=win2008r2/console whereas job #130 off the same code had an all-pass (note that 130 and 131 ran on different Windows 2008 machines so perhaps there's a machine-specific thing here?).

I'm guessing this is to do with the OpenSSL upgrade in some way although the changes brought in on that were pretty minor so I don't know how.

  • test-https-foafssl.js
  • test-tls-alert.js
  • test-tls-dhe.js
  • test-tls-ecdh.js
  • test-tls-ecdh-disable.js
  • test-tls-no-sslv3.js
  • test-tls-securepair-server.js
  • test-tls-session-cache.js
  • test-tls-set-ciphers.js

/ @nodejs/crypto

@mscdex mscdex added tls Issues and PRs related to the tls subsystem. test Issues and PRs related to the tests. labels Jul 10, 2015
@indutny
Copy link
Member

indutny commented Jul 10, 2015

Yeah, I think this is something about the openssl apps that we bundle. This exit code stands for:

Stack buffer overflow / overrun

On Windows.

@joaocgreis
Copy link
Member

I couldn't figure this one out so far. Here is what I know now:

On win2008r2-1 I noticed this:

  • Running openssl-cli works.
  • Running openssl-cli s_client crashes while displaying Loading 'screen' into random state -.
  • Running openssl-cli s_client -no_rand_screen works.

These tests fail with timeout:

These tests all appear to fail when invoking openssl-cli:

@rvagg
Copy link
Member Author

rvagg commented Jul 18, 2015

@shigeki this has been happening since the last openssl upgrade iirc, was there a floating patch that wasn't reapplied or something?

@shigeki
Copy link
Contributor

shigeki commented Jul 18, 2015

@rvagg The patch is really applied at https://github.com/nodejs/io.js/blob/master/deps/openssl/openssl/apps/s_client.c#L1135-L1138. It seems that the issue occurs when invoking openssl s_client without -no_rand_screen. I'm now looking at it.

@shigeki
Copy link
Contributor

shigeki commented Jul 18, 2015

@rvagg Is there any way to run CI jobs to iojs-rackspace-win2008r2-1?

@shigeki
Copy link
Contributor

shigeki commented Jul 18, 2015

There seems no changes on related sources such as app_rand.c and rand_win.c in upgrading. I guess readscreen() in rand_win.c causes the problem in the server of win2008r2 on CI but I'm not sure its reason until I make a debug on that machine.
The current workaround is to add -no_rand_screen option to all tls tests that use openssl-cli s_clientand I made a patch in shigeki@255e9d2 . Do we make a further investigation or apply this just for workaround?

@rvagg
Copy link
Member Author

rvagg commented Jul 18, 2015

@shigeki you can just go to https://jenkins-iojs.nodesource.com/job/iojs+pr+win/ and fire up a test run jut for the windows machines using the same inputs as for any+pr-multi

Could this be related to not having a login session on that machine? We've changed the way Jenkins starts to solve some other problems, perhaps it's related to this. Deferring to @joaocgreis and @orangemocha on this one.

@orangemocha
Copy link
Contributor

Could this be related to not having a login session on that machine?

I doubt that this is the cause of the issue. There is still a login session, just an automatic login.

@joaocgreis
Copy link
Member

Do we make a further investigation or apply this just for workaround?

@shigeki I believe we should do both. If I'm not mistaken, openssl-cli is used only for our tests and to test io.js, not to be tested itself, so we should lose nothing by using -no_rand_screen. I think your patch should land ASAP to make CI usable again.

It would be good to understand the root cause of this, so I think we should leave this issue open and investigate when possible.

@shigeki
Copy link
Contributor

shigeki commented Jul 21, 2015

If I'm not mistaken, openssl-cli is used only for our tests and to test io.js, not to be tested itself, so we should lose nothing by using -no_rand_screen.

That's right.

I made several debug tests on iojs+pr+win and found that GetDIBits() API in readscreen() causes the crash of openssl-Cli at https://github.com/nodejs/io.js/blob/master/deps/openssl/openssl/crypto/rand/rand_win.c#L733-L734. GetDIBits() copies screen bitmap data to buffer but it fails when reading at height of around 528 as seen in https://jenkins-iojs.nodesource.com/job/iojs+pr+win/169/nodes=win2008r2/console.

I'm not sure why it fails only on win2008 in CI but the issue is surly caused by Rand_screen() and -no_rand_screen can be its workaround. I will submit a PR soon.

shigeki pushed a commit that referenced this issue Jul 22, 2015
RAND_screen() causes stability issues in invoking openssl-cli s_client
on win2008r2 in CI. Disable to use it by adding -no_rand_screen
options to all tls tests that use common.opensslCli.

Fixes: #2150
PR-URL: #2209
Reviewed-By: Rod Vagg <[email protected]>
Reviewed-By: Joao Reis <[email protected]>
Reviewed-By: Jeremiah Senkpiel <[email protected]>
@brendanashworth
Copy link
Contributor

Is this fixed now?

@brendanashworth brendanashworth added the windows Issues and PRs related to the Windows platform. label Jul 26, 2015
@rvagg rvagg closed this as completed Jul 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Issues and PRs related to the tests. tls Issues and PRs related to the tls subsystem. windows Issues and PRs related to the Windows platform.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants