Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read ECONNRESET in @grpc/grpc-js but not in grpc package #1994

Open
Siddhesh-Swami opened this issue Dec 22, 2021 · 14 comments
Open

read ECONNRESET in @grpc/grpc-js but not in grpc package #1994

Siddhesh-Swami opened this issue Dec 22, 2021 · 14 comments

Comments

@Siddhesh-Swami
Copy link

Siddhesh-Swami commented Dec 22, 2021

Description:
we were using @grpc/grpc-js package in the Kubernetes cluster with the alpine image, recently we got the chance to test in production. Sparingly we are observing the read ECONNRESET on the client-side with no logs on the server-side. We switched to an older version of @grpc/grpc-js--1.2.4 but still the error was observed.
In one of the microservices, we used grpc package with nestjs. that service never gave read ECONNRESET. so migrated all the microservices to [email protected] package and now we do not face the read ECONNRESET error. The client takes a pretty good amount of time to connect to the server around like 2secs 3secs but no read ECONNRESET error is observed.

Environment:

  • OS name, version and architecture: [e.g. Linux Ubuntu 18.04 amd64 Alpine ]
  • docker image node:14.16.1-alpine
  • Kubernetes istio load balancing
  • Node version 14.16.1
    -@grpc/proto-loader: 0.5.6
    Earlier package: @grpc/grpc-js
    New package: [email protected]

please tell any more details to add.

@Siddhesh-Swami
Copy link
Author

Any updates please?

@vanthome
Copy link

We have a similar issue that seems to appear only when our node.js application is deployed on Kubernetes. Here is our stack:

  • Node 16
  • Docker engine on Kubernetes
  • Calico Networking
  • grpc-js Version 1.3.6

We are getting this error so frequently that it cannot be due to sporadic connectivity issues.

@haimrait
Copy link

Any updates?
We are having that symptom as well

@hanstf
Copy link

hanstf commented Jun 21, 2022

we have similar issue as well:

  • node 16.14
  • docker engine on k8s
  • calico networking
  • grpc-js 1.5.7
  • grpc server and client 2 replicas without service mesh

this normally happened after > 10 hours of idle

@bangbang93
Copy link

bangbang93 commented Jun 22, 2022

something related to keepalive?
after adding

      keepalive: {
        keepaliveTimeMs: ms('5m'),
      },

I did not get connection reset for several weeks.
The default keepalive options might be different between grpc and @grpc/grpc-js

@railsonluna
Copy link

Any updates?

@khanh-le-otsv
Copy link

@bangbang93 After changing the code, have you faced the issue again?

@bangbang93
Copy link

@bangbang93 After changing the code, have you faced the issue again?

get rid of this for several months.

@tomaswitek
Copy link

tomaswitek commented Oct 19, 2022

@bangbang93 After changing the code, have you faced the issue again?

get rid of this for several months.

@bangbang93 we have the same issues, aren't you afraid the performance could suffer after setting this option? 5 minutes sound like a lot.

This is a comment from the source code:

The amount of time to wait for an acknowledgement after sending a ping

Here is a link:

* The amount of time to wait for an acknowledgement after sending a ping

Nevertheless I just applied it to our services, let's see how it will play out.

@bangbang93
Copy link

@bangbang93 After changing the code, have you faced the issue again?

get rid of this for several months.

@bangbang93 we have the same issues, aren't you afraid the performance could suffer after setting this option? 5 minutes sound like a lot.

This is a comment from the source code:

The amount of time to wait for an acknowledgement after sending a ping

Here is a link:

* The amount of time to wait for an acknowledgement after sending a ping

Nevertheless I just applied it to our services, let's see how it will play out.

keepaliveTimeMs,not keepaliveTimeoutMs,

/**
* The amount of time in between sending pings
*/
private keepaliveTimeMs: number = KEEPALIVE_MAX_TIME_MS;

@tomaswitek
Copy link

tomaswitek commented Oct 19, 2022

@bangbang93 sorry I sent a wrong link. I tried both and I still get the message :(, but thx for helping

@HofmannZ
Copy link

This works for us:

const channelOptions: ChannelOptions = {
  ...channelOptions,
  // Send keepalive pings every 10 seconds, default is 2 hours.
  'grpc.keepalive_time_ms': 10 * 1000,
  // Keepalive ping timeout after 5 seconds, default is 20 seconds.
  'grpc.keepalive_timeout_ms': 5 * 1000,
  // Allow keepalive pings when there are no gRPC calls.
  'grpc.keepalive_permit_without_calls': 1,
};

✌️

@logidelic
Copy link

Thank you @HofmannZ . Is that fix reliable for you or just makes the problem less evident?

@HofmannZ
Copy link

Hey @logidelic,

We ended up with the following config for the client:

// See: https://grpc.github.io/grpc/cpp/md_doc_keepalive.html
const channelOptions: ChannelOptions = {
  ...channelOptions,
  // Send keepalive pings every 6 minutes, default is none.
  // Must be more than GRPC_ARG_HTTP2_MIN_RECV_PING_INTERVAL_WITHOUT_DATA_MS on the server (5 minutes.)
  'grpc.keepalive_time_ms': 6 * 60 * 1000,
  // Keepalive ping timeout after 5 seconds, default is 20 seconds.
  'grpc.keepalive_timeout_ms': 5 * 1000,
  // Allow keepalive pings when there are no gRPC calls.
  'grpc.keepalive_permit_without_calls': 1,
};

And the following config for the server:

// See: https://grpc.github.io/grpc/cpp/md_doc_keepalive.html
const channelOptions: ChannelOptions = {
  ...channelOptions,
  // Send keepalive pings every 10 seconds, default is 2 hours.
  'grpc.keepalive_time_ms': 10 * 1000,
  // Keepalive ping timeout after 5 seconds, default is 20 seconds.
  'grpc.keepalive_timeout_ms': 5 * 1000,
  // Allow keepalive pings when there are no gRPC calls.
  'grpc.keepalive_permit_without_calls': 1,
};

We've been running it in production for a couple of months, and it works reliably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests