Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket writes fail silently after reconnecting #4417

Closed
hyperlink opened this issue Dec 24, 2015 · 3 comments
Closed

Socket writes fail silently after reconnecting #4417

hyperlink opened this issue Dec 24, 2015 · 3 comments
Labels
net Issues and PRs related to the net subsystem.

Comments

@hyperlink
Copy link

I found this issue while testing how our app handles a network interruption and noticed this behavior with using node sockets in Linux. I have verified this issue happens with both 0.10.41 and 4.2.3.

Reproduce Setup

  • Need to setup two docker containers. Container deps: update openssl to 1.0.1j #1 hosts the TCP echo service and container test: don't remove empty.txt on win32 #2 runs the socket write code.
  • In the echo container I installed ecco
  • The socket write container will run the app but it needs to have iptables and netstat to run the test.
  • Run the test container test: don't remove empty.txt on win32 #2 on the host network and make it privilaged. --net=host --privileged
  • In our test we need to wait for an ETIMEDOUT which can take 15 minutes with the default setting. To make the test bearable run sysctl -w net.ipv4.tcp_retries2=8 in the test container to make this around 100s instead.

Dockerfile

FROM node:0.10.41
RUN apt-get update && apt-get install -y iptables net-tools
RUN npm install -g ecco
CMD ["bash"]

Socket write code

#!/usr/bin/env node

"use strict"

var net = require('net');

var MESSAGE_RATE = 1000;
var RANDOM_DATA = 700;
var host = '172.17.0.4';
var port = '9999';

var debug = require('debug')('SocketTest');

var socket = net.createConnection(port, host);
socket.addr = host + ':' + port;
socket.host = host;
socket.port = port;

socket.on('connect', function () {
  debug(this.address())
  var lastError = this.error;
  this.error = null;
  if (lastError) {
      debug('reconnect');
  } else {
      startLoop()
  }
});
socket.on('error', function (err) {
  this.error = err;
  debug('error', err);
});
socket.on('close', function (had_error) {
  debug('close');
  retry(this);
});
socket.on('end', function () {
  debug('Socket End')
  retry(this);
});
socket.buffer = new Buffer([]);
socket.on('data', function (data) {
  debug('received data', data)
});
socket.on('drain', function(){
  debug('socket drained');
});
socket.setKeepAlive(true, 20000);

function retry(s) {
  if(s.retrying || s.closing) return;
  s.retrying = true;
  s.retryTimer = setTimeout(function () {
      s.retrying = false;
      s.connect(s.port, s.host);
  }, 1000);
}

var messageNumber = 0
var lastTimer = null

function startLoop() {
    if (!socket.error) {
        var currentNumber = messageNumber
        debug('Sending message #'+ messageNumber)
        var ret = socket.write(new Array(RANDOM_DATA).join(Math.random().toString(36)) + ' Received message # ' + messageNumber++, sendCb)
       debug('Socket write %s', ret ? 'flushed to kernel': 'queued in memory');
       debug('Socket %d bytes written', socket.bytesWritten);
       debug('Socket %s bytes read', socket.bytesRead);
       debug('Socket bufferSize %d', socket.bufferSize);
    } else {
        debug('socket error not writing.')
    }

    lastTimer = setTimeout(startLoop, MESSAGE_RATE);

    function sendCb(){
        debug('Callback called for #%d', currentNumber);
        currentNumber = null
    }
}

Reproduce Steps

  1. Start the echo app on container deps: update openssl to 1.0.1j #1. ecco -l -v -p 9999 --address 0.0.0.0
  2. Make note of the IP address for that container using awk 'NR==1 {print $1}' /etc/hosts
  3. Update the IP in the test code (in my case it's 172.17.0.4) and run the test app
  4. Observe the data streaming in the echo container's terminal
  5. In container test: don't remove empty.txt on win32 #2 add the firewall rule to block the echo server's IP iptables -A INPUT -s "172.17.0.4" -j DROP;iptables -A OUTPUT -s "172.17.0.4" -j DROP;
  6. Notice the data stops streaming in the echo terminal. This is expected behavior.
  7. Run netstat -napc in container test: don't remove empty.txt on win32 #2 and observe the socket's Send-Q increasing over time.
  8. Wait until the ETIMEDOUT occurs and then remove the rule. iptables -D INPUT -s "172.17.0.4" -j DROP;iptables -D OUTPUT -s "172.17.0.4" -j DROP;
  9. Verify the rule is removed iptables -nvL
  10. After a bit notice container deps: update openssl to 1.0.1j #1 says client connected after container test: don't remove empty.txt on win32 #2 reconnects.
  11. The socket appears to write out data but no data appears to the echo terminal. This is unexpected.
  12. Apply the block rule again and do netstat again. We expect the Send-Q of the new connection to accumulate but it does not. This is also unexpected.

(don't forget to remove the rule after testing)

Observations

  • This issue doesn't happen on a Mac. I don't have an windows enviorment so I have not tested there.
  • This issue also does not happen when you send small amounts of data to socket.write. if you change the RANDOM_DATA to a lower number like 70 it will recover.
  • In 0.10.41 I noticed process._errno contains an ECANCELED error and doesn't look like socket code errors out the socket when it receives this error string from the handle.
@mscdex mscdex added the net Issues and PRs related to the net subsystem. label Dec 24, 2015
@Trott
Copy link
Member

Trott commented Jun 9, 2016

I'm guessing this behavior is unchanged in Node.js 6.x.x, but any chance someone can try it and confirm?

@Trott
Copy link
Member

Trott commented Jul 7, 2017

I don't suppose any Docker enthusiasts in @nodejs/docker might be able to see if this is still happening in Node.js 8 and/or determine if there's some non-Node.js cause to the issue as described?

@apapirovski
Copy link
Member

It has been over two years, no one has been able to investigate and no further information is available. While I don't doubt this is a valid report, it seems unlikely that anyone will take it up now. Also it's impossible to know whether this issue persists into more current Node.js versions (since 4.x is EOL very soon) or newer Docker versions.

That said, if you feel I've made an error in closing this, feel free to re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
net Issues and PRs related to the net subsystem.
Projects
None yet
Development

No branches or pull requests

4 participants