A very fast retry strategy hangs blocking commands forever #1718

manast · 2023-02-03T12:52:12Z

I was debugging an issue reported to BullMQ related to reconnections and I found something really strange that somehow seems to also affect blocking connections (need to investigate this more though).

How to reproduce

Run this simple code:

const Redis = require("ioredis");

const connection = new Redis({
  maxRetriesPerRequest: null,
  retryStrategy: function (times) {
    return 10;
  },
});
connection.on("connect", async () => {
    console.log("Connected to Redis!")
    const result = await connection.set("test", "test");   
    console.log("OK, Command issued", result);
});
connection.on("error", (err) => console.error("Redis error", err.message));
connection.on("reconnecting", () => console.log("Reconnecting to Redis!"));

You will get this output:

Connected to Redis!
OK, Command issued OK

Leave the code running and just close your Redis instance, in my case since I am running docker:

docker stop some-redis

Wait a few seconds and then start it again:

docker start some-redis

The result would be:

Redis error connect ECONNREFUSED 127.0.0.1:6379
Reconnecting to Redis!
Redis error connect ECONNREFUSED 127.0.0.1:6379
Reconnecting to Redis!
Redis error connect ECONNREFUSED 127.0.0.1:6379
Reconnecting to Redis!
Connected to Redis!
Reconnecting to Redis!
Connected to Redis!
OK, Command issued OK
OK, Command issued OK
OK, Command issued OK
OK, Command issued OK

Notice that the output of 4 "OK, Command issued OK". Somehow the callback to the connect event was called 4 times instead of one. Now if we modify the code to retry once per second instead:

const Redis = require("ioredis");

const connection = new Redis({
  maxRetriesPerRequest: null,
  retryStrategy: function (times) {
    return 1000;
  },
});
connection.on("connect", async () => {
    console.log("Connected to Redis!")
    const result = await connection.set("test", "test");   
    console.log("OK, Command issued", result);
});
connection.on("error", (err) => console.error("Redis error", err.message));
connection.on("reconnecting", () => console.log("Reconnecting to Redis!"));

And we do the same as we did above then we get the correct output, i.e. one single output of "OK, Command issued OK".

It seems like there is a small hazard that happens when the retries happen very close together. Not sure if this is completely harmless or not, since I am not done debugging my particular case, but I think this should be investigated and resolved if possible.

The text was updated successfully, but these errors were encountered:

manast · 2023-02-03T13:52:58Z

My suspictions turned out to be correct. If you have issued a blocking command, then for some reason it gets stuck in that command ONLY when the retry is very fast. So adding this code to the original issue:

async function run() {
  while (true) {
    console.log("Waiting for message...");
    const result = await connection.brpop("whatever", 1);
    console.log("Message received", result);
  }
}

run();

It will try to do a blocking pop with a 1-second timeout. While it is connected it will loop forever with 1-second timeout. If you stop Redis and you have a very fast retry delay, and then start redis again it will not continue looping anymore...

manast · 2023-02-10T09:42:49Z

@luin it would be great if you could peek at this issue, it seems like it is snowballing 🤔 anything I can do to help?

luin · 2023-02-11T07:50:54Z

@manast 👋

I was not able to reproduce the issue:

CleanShot.2023-02-11.at.15.48.56.mp4

No matter if I changed retryStrategy to 10 or 1, I always got one "Command issued". Do I miss anything?

manast · 2023-02-11T08:55:39Z

Is the Redis instance running locally or inside a docker container? (in my case I run it in a docker container, but the test itself in my local machine).

manast · 2023-02-11T08:56:49Z

Also, did you try replacing the set command by brpop as in the second comment?

luin · 2023-02-11T15:19:42Z

Thanks! I finally reproduced this with docker. This happens when reconnecting successfully but the ready check fails. In our code, what we do at different events:

On close, we move the unfulfilled commands from commandQueue to prevCommandQueue.
On connect, we reset commandQueue.
On ready, we resent commands in prevCommandQueue.

When the issue happens, the events are emitted in the order close -> connect -> close -> connect -> ready, so brpop is lost at the second close.

Closes #1718

manast · 2023-02-11T15:32:18Z

Yes, that makes a lot of sense. I have had similar issues in the past as it is difficult to guarantee the order of asynchronous events that happens very tight and that can potentially interleave.

Closes #1718

## [5.3.1](v5.3.0...v5.3.1) (2023-02-12) ### Bug Fixes * Fix commands not resend on reconnect in edge cases ([#1720](#1720)) ([fe52ff1](fe52ff1)), closes [#1718](#1718) * Fix db parameter not working with auto pipelining ([#1721](#1721)) ([d9b1bf1](d9b1bf1))

github-actions · 2023-02-12T02:14:13Z

🎉 This issue has been resolved in version 5.3.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Closes #1718

## [5.3.1](v5.3.0...v5.3.1) (2023-02-12) ### Bug Fixes * Fix commands not resend on reconnect in edge cases ([#1720](#1720)) ([fe52ff1](fe52ff1)), closes [#1718](#1718) * Fix db parameter not working with auto pipelining ([#1721](#1721)) ([d9b1bf1](d9b1bf1))

## [5.3.1](redis/ioredis@v5.3.0...v5.3.1) (2023-02-12) ### Bug Fixes * Fix commands not resend on reconnect in edge cases ([#1720](redis/ioredis#1720)) ([fe52ff1](redis/ioredis@fe52ff1)), closes [#1718](redis/ioredis#1718) * Fix db parameter not working with auto pipelining ([#1721](redis/ioredis#1721)) ([d9b1bf1](redis/ioredis@d9b1bf1))

manast mentioned this issue Feb 3, 2023

Fix/larger default retry time taskforcesh/bullmq#1654

Merged

manast changed the title ~~A very fast retry strategy gives a strange behaviour~~ A very fast retry strategy hangs blocking commands forever Feb 3, 2023

manast mentioned this issue Feb 5, 2023

Jobs stuck in delayed state indefinitely taskforcesh/bullmq#1656

Closed

andris9 mentioned this issue Feb 6, 2023

Changed Redis reconnect backoff to delay at least 1 second postalsys/emailengine#266

Merged

luin added the need reproduce label Feb 11, 2023

luin added bug and removed need reproduce labels Feb 11, 2023

luin added a commit that referenced this issue Feb 11, 2023

Fix commands not resend on reconnect in edge cases

cd96023

Closes #1718

luin mentioned this issue Feb 11, 2023

Fix commands not resend on reconnect in edge cases #1720

Merged

luin closed this as completed in #1720 Feb 12, 2023

luin added a commit that referenced this issue Feb 12, 2023

fix: Fix commands not resend on reconnect in edge cases (#1720)

fe52ff1

Closes #1718

github-actions bot added the released label Feb 12, 2023

DanielNetzer mentioned this issue Feb 17, 2023

addBulk performance issues taskforcesh/bullmq#1670

Open

luin added a commit that referenced this issue Apr 15, 2023

fix: Fix commands not resend on reconnect in edge cases (#1720)

3fb8b7f

Closes #1718

Stanislav1975 mentioned this issue Aug 9, 2024

[Snyk] Upgrade ioredis from 4.28.4 to 5.4.1 Stanislav1975/shields#603

Open

vs4vijay mentioned this issue Sep 10, 2024

[Snyk] Upgrade: dotenv, ioredis vs4vijay/KARA#105

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A very fast retry strategy hangs blocking commands forever #1718

A very fast retry strategy hangs blocking commands forever #1718

manast commented Feb 3, 2023

manast commented Feb 3, 2023

manast commented Feb 10, 2023 •

edited

Loading

luin commented Feb 11, 2023

manast commented Feb 11, 2023

manast commented Feb 11, 2023

luin commented Feb 11, 2023

manast commented Feb 11, 2023 •

edited

Loading

github-actions bot commented Feb 12, 2023

A very fast retry strategy hangs blocking commands forever #1718

A very fast retry strategy hangs blocking commands forever #1718

Comments

manast commented Feb 3, 2023

How to reproduce

manast commented Feb 3, 2023

manast commented Feb 10, 2023 • edited Loading

luin commented Feb 11, 2023

manast commented Feb 11, 2023

manast commented Feb 11, 2023

luin commented Feb 11, 2023

manast commented Feb 11, 2023 • edited Loading

github-actions bot commented Feb 12, 2023

manast commented Feb 10, 2023 •

edited

Loading

manast commented Feb 11, 2023 •

edited

Loading