Retry operation when network connection is down. by ShivangiReja · Pull Request #31 · Azure/amqp-common-js

ShivangiReja · 2019-03-04T20:48:37Z

Description

Continue retry operation when internet connection goes down.

Created a new method that checks for internet connectivity.
If we are not connected with the internet then continue retry operation.

Reference to any github issues

Reconnection problems when connection goes down azure-event-hubs-node#178

@ramya-rao-a @AlexGhiondea

lib/retry.ts

AlexGhiondea · 2019-03-05T00:15:07Z

lib/retry.ts

+async function checkInternet(): Promise<boolean> {
+  return new Promise((resolve) => {
+    require("dns").lookup("ms.portal.azure.com", function (err: any): void {
+      if (err && err.code === "ENOTFOUND") {


Would it be possible that the this dns query resolves even when the connection is not there? For instance, if there is a dns cache it might be possible that the value is returned from the cache without doing a network call.

Good catch!
You are right, the lookup method indeed reuses the dns cache and therefore not guaranteed to perform any network communication. Therefore, we will have to use the resolve method instead of lookup which will query the dns server and hence always use the network to perform queries.

It's still possible to "resolve" without having "internet" FWIW. This occurs often on our wifi network - DNS queries work as the DNS server is within our LAN (still a few hops away), but access to the outside world is broken. I think this feature should be understood as a 90% sort of solution for "ethernet unplugged" and "switching to/from wifi" scenarios.

AlexGhiondea

I added a couple of comments

lib/retry.ts

package.json

ramya-rao-a · 2019-03-06T04:10:21Z

lib/retry.ts

+      const isConnected = await checkNetworkConnection();
+      if (!isConnected) {
+        err.name = "ConnectionDetachError";
+        err.retryable = true;


We make this check for every error seen by every retry attempt. Can we reduce the number of checks being made?
One example is when the existing error is already retryable, then we don't need to make this check.

Also, maybe we should only add this check when we see the ServiceCommunicationError which maps to the ENOTFOUND. Thoughts?

One example is when the existing error is already retryable, then we don't need to make this check - This makes sense. I'll add one more check that if existing error is already retryble then no need to do anything.

Also, maybe we should only add this check when we see the ServiceCommunicationError which maps to the ENOTFOUND. - This check would be really specific for our use case only. In future if user will get different non-retryable error then he will not be able to retry again.

Also, I believe that if the error is not retryable and if the network connection is down, we should anyways retry the operation. What do you think ?

If the network was down, then we wouldn't be getting any error from the service in the first place isn't it?

So you mean that we always get the same ServiceCommunicationError during a network failure ? Doesn't it depends on the program state when network goes down ?

Also, what I am proposing above is that it doesn't matter whether we always get the ServiceCommunicationError or not, we should retry if the error is not retryable and we are not connected to the internet.

If the network is down and we make a network request, Nodejs will always give ENOTFOUND error and amqp-common translates that to ServiceCommunicationError. Nodejs doesn't care about the program state, it only sees a network request being made.

ENOTFOUND can happen due to multiple reasons, network being down is one of them.

Just one more concern about this :)

I was talking about the case when we have made connection with the service, sent a request and is now waiting for a response but network fails. What would happen in this case? Will we get the same ENOTFOUND error?

In this case, rhea will send a disconnected event first. This results in calling detached for all open senders and receivers. The detached function sees that the sdk didnt close the sender/receiver and will try to reconnect the sender/receiver. The first retry attempt to run _init to reconnect them will fail with the ServiceCommunicationError and that is when your network connection check will come in.

This case is specific for our Event Hub and Service bus SDK. If other user will use the same retry method of Amqp-common, will they get the same ENOTFOUND error?
In this case I was thinking It should either be ConnectionTimeout or SocketClose error?

Re: ENOTFOUND, neither Node docs, errno docs, or getaddrinfo docs say anything about this error. However, a search of GitHub code shows we are not the only ones responding to this kind of error, so I find it extremely unlikely Node would change this error in the future. I've sent some tweets on the topic, will get back here if I hear anything different.

ENOTFOUND is indeed not a real posix error and that explains why it's hard to find any info on it. It's actually a node-specific error that isn't documented. The source of the error is here. While the comment implies the error will go away, on twitter, Anna (Node core member) suggests the error won't change and agrees that the comment should be removed and the error documented. I can push on that if we find it important, but personally I'm satisfied that this behavior won't change because of all the things that would break.

In this case I was thinking It should either be ConnectionTimeout or SocketClose error?

Are you saying that we should rename ServiceCommunicationError or the ConnectionDetachedError?

Closing the loop here based on offline conversation

This case is specific for our Event Hub and Service bus SDK. If other user will use the same retry method of Amqp-common, will they get the same ENOTFOUND error?

amqp-common is tightly coupled with rhea-promise and rhea. So, all users of amqp-common will get the disconnected event when the network goes down, which they can then react to just how we do.

Also, lets rename ConnectionDetachedError to ConnectionLostError

lib/retry.ts

ramya-rao-a · 2019-03-08T19:41:36Z

This fixes #32

ramya0820

As discussed offline - for the URL being pinged, depending on client side network setup there could only be access to azure service endpoint in context.
In such cases, using the right port number would be important as well (based on the network firewall configuration on client side).

lib/retry.ts

…mqp-common-js into NetworkConn_Error

…ing session" This reverts commit e192582.

ShivangiReja added 2 commits March 4, 2019 12:34

Retry operation when network connection is down.

df07d7e

Updating nyc package version.

82d4476

AlexGhiondea reviewed Mar 5, 2019

View reviewed changes

lib/retry.ts Outdated Show resolved Hide resolved

AlexGhiondea reviewed Mar 5, 2019

View reviewed changes

ramya-rao-a reviewed Mar 5, 2019

View reviewed changes

lib/retry.ts Outdated Show resolved Hide resolved

ramya-rao-a reviewed Mar 5, 2019

View reviewed changes

package.json Show resolved Hide resolved

Updating dns method.

69587f6

ramya-rao-a reviewed Mar 6, 2019

View reviewed changes

ShivangiReja added 2 commits March 6, 2019 11:23

Added checks with network connection check.

515702d

Removed ServiceComminicationError Check

5d185f5

bterlson mentioned this pull request Mar 6, 2019

[TS] Need section for robust networking Azure/azure-sdk#230

Closed

Added check for ServiceCommunicationError

a284ddf

ramya-rao-a reviewed Mar 7, 2019

View reviewed changes

lib/retry.ts Outdated Show resolved Hide resolved

ramya-rao-a reviewed Mar 7, 2019

View reviewed changes

lib/retry.ts Outdated Show resolved Hide resolved

ShivangiReja added 2 commits March 7, 2019 12:58

Move the call to checkNetworkConnection inside the if.

3d9a597

Adding host name to check the network connectivity.

9b4a8e4

AlexGhiondea reviewed Mar 8, 2019

View reviewed changes

lib/retry.ts Show resolved Hide resolved

Merge branch 'master' into NetworkConn_Error

f8dd7f3

ramya-rao-a approved these changes Mar 8, 2019

View reviewed changes

ramya-rao-a mentioned this pull request Mar 8, 2019

When network connection is down, amqp-common gives non retryable error #32

Closed

ShivangiReja requested review from AlexGhiondea and bterlson March 8, 2019 21:09

ShivangiReja added Client Issues that refer to the client sdk Service Bus and removed Service Bus labels Mar 8, 2019

ShivangiReja self-assigned this Mar 8, 2019

ramya0820 suggested changes Mar 11, 2019

View reviewed changes

lib/retry.ts Show resolved Hide resolved

bterlson mentioned this pull request Mar 11, 2019

Add support for Browsers & Web Sockets #29

Merged

5 tasks

ShivangiReja added 2 commits March 12, 2019 11:08

Method that removes the sender,receiver link and its underlying session

e192582

Merge branch 'NetworkConn_Error' of https://github.com/ShivangiReja/a…

4ffe13c

…mqp-common-js into NetworkConn_Error

bterlson approved these changes Mar 12, 2019

View reviewed changes

Revert " Method that removes the sender,receiver link and its underly…

f4e61ae

…ing session" This reverts commit e192582.

ramya-rao-a merged commit eee2aa2 into Azure:master Mar 12, 2019

Conversation

ShivangiReja commented Mar 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Reference to any github issues

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexGhiondea left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ramya-rao-a commented Mar 8, 2019

Uh oh!

ramya0820 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ShivangiReja commented Mar 4, 2019 •

edited

Loading