Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LockError: Unable to fully release the lock on resource #65

Closed
nmargaritis opened this issue Feb 14, 2020 · 38 comments
Closed

LockError: Unable to fully release the lock on resource #65

nmargaritis opened this issue Feb 14, 2020 · 38 comments
Labels

Comments

@nmargaritis
Copy link

nmargaritis commented Feb 14, 2020

After upgrading to v4, I started getting this error LockError: Unable to fully release the lock on resource.

Redis server v=5.0.7. (appears to have the same problems in newer versions as well)

I rolled back to version 3.1.2 for now.

This could be related with #39

@YSMull
Copy link

YSMull commented Feb 17, 2020

me too!

@maxs15
Copy link

maxs15 commented Feb 22, 2020

Same issue on my side!

@mike-marcacci
Copy link
Owner

Hi all! Just now getting a chance to look at this. I've combed through the logic and tests and I haven't found anything suspicious.

A few questions:

  1. Are these running against redis instances or clusters?
  2. Are there multiple instances/clusters?
  3. Do you have any non-default settings on your clusters that effect behavior?
  4. What client lib are you using (ioredis/node-redis)?
  5. Do you have any non-default settings on your client lib?

Is there any chance one of you can create a minimal reproduction of this, perhaps using docker-compose to set up the environment?

@nmargaritis
Copy link
Author

Hi all! Just now getting a chance to look at this. I've combed through the logic and tests and I haven't found anything suspicious.

A few questions:

  1. Are these running against redis instances or clusters?
  2. Are there multiple instances/clusters?
  3. Do you have any non-default settings on your clusters that effect behavior?
  4. What client lib are you using (ioredis/node-redis)?
  5. Do you have any non-default settings on your client lib?

Is there any chance one of you can create a minimal reproduction of this, perhaps using docker-compose to set up the environment?

  1. Instances
  2. Single instance in my usecase
  3. Nothing special
  4. Ioredis "ioredis": "4.15.1"
  5. I am providing my redlock + ioredis init
        this.redlock = new Redlock([this.store], {
            driftFactor: 0.01, // time in ms
            retryCount: 25,
            retryDelay: 300, // time in ms
            retryJitter: 200 // time in ms
        });

this.store = new Redis({
            host: process.env.REDIS_HOST,
            port: process.env.REDIS_PORT,
            password: process.env.REDIS_PASSWORD,
            connectionName: 'translations',
            keepAlive: true,
            retryStrategy: function(times) {
                var delay = Math.min(times * 3000, 60000);
                return delay;
            },
            reconnectOnError: err => {
                var targetError = 'READONLY';
                if (err.message.slice(0, targetError.length) === targetError) {
                    // Only reconnect when the error starts with "READONLY"
                    return true; // or `return 1;`
                }
            }
        });

Not sure about the latter. Btw Redis is on AWS (cache), if this is helpful.

@amirzandi
Copy link

I still have problem with following error:
{ LockError: Unable to fully release the lock on resource "lock:tournament:winner:289247". 0|server | at Command.loop [as callback] (/var/www/tornoq/node_modules/redlock/redlock.js:264:18) 0|server | at normal_reply (/var/www/tornoq/node_modules/redis/index.js:726:21) 0|server | at RedisClient.return_reply (/var/www/tornoq/node_modules/redis/index.js:824:9) 0|server | at JavascriptRedisParser.returnReply (/var/www/tornoq/node_modules/redis/index.js:192:18) 0|server | at JavascriptRedisParser.execute (/var/www/tornoq/node_modules/redis-parser/lib/parser.js:574:12) 0|server | at Socket.<anonymous> (/var/www/tornoq/node_modules/redis/index.js:274:27) 0|server | at Socket.emit (events.js:193:13) 0|server | at addChunk (_stream_readable.js:296:12) 0|server | at readableAddChunk (_stream_readable.js:277:11) 0|server | at Socket.Readable.push (_stream_readable.js:232:10) 0|server | at TCP.onStreamRead (internal/stream_base_commons.js:150:17) 0|server | name: 'LockError', 0|server | message: 0|server | 'Unable to fully release the lock on resource "lock:tournament:winner:289247".', 0|server | attempts: undefined }

Please help

@amirzandi
Copy link

@mike-marcacci please check this error.

@amirzandi
Copy link

@mike-marcacci Why dont you follow up issues? we are facing much problems with this error.... If you are not able to resolve it, please let us know to change the component.

@nmargaritis
Copy link
Author

@mike-marcacci Why dont you follow up issues? we are facing much problems with this error.... If you are not able to resolve it, please let us know to change the component.

@amirzandi Hi, have you tried to roll back to version 3.1.2 until this issue is resolved?
Maybe the maintainer does not have enough information to debug this issue, for now.

@amirzandi
Copy link

I finally found out the problem and I resolved it myself. I will have some plan to use another component in order to get rid of such issues.

@nmargaritis
Copy link
Author

I finally found out the problem and I resolved it myself. I will have some plan to use another component in order to get rid of such issues.

@amirzandi Could you please create a pull request so that the fix is merged back to the project, and anyone can use it.

@amirzandi
Copy link

I only increased the lock time to 60000 milliseconds and it is now working fine.

@seanfisher
Copy link

seanfisher commented Apr 20, 2020

I had this same error in version 4.1.0, which went away after I removed the keyPrefix option from my ioredis (v4.16.2) configuration.

@b00tsy
Copy link

b00tsy commented Apr 30, 2020

For me this goes away if going back to 3.1.2. But having a look at the timing it appears that this "new" error message pops up when the lock expired. Eg. if locking for 250 ms and the actual processing while the lock is acquired takes 300 ms an error gets thrown. Maybe this was enforced with 4.0.0 and before was handled silently?

@mike-marcacci
Copy link
Owner

Hey folks, sorry for the slow replies: I've been unbelievably busy this year so far. I think that @b00tsy may be onto something here, especially given @amirzandi's workaround.

Version 3 should have rejected with the same error, but perhaps it wasn't? Unlocking was originally a best-effort operation that wouldn't report an error, but that was changed quite early on.

For those of you who have reverted to version 3, can you watch your redis instances for key expirations? Something like this:

config set notify-keyspace-events Ex
psubscribe __keyevent@0__:expired

If all is well and you are acquiring or extending locks with adequate durations, you shouldn't see any keys that match your locks expiring.

@b00tsy
Copy link

b00tsy commented May 1, 2020 via email

@bohendo
Copy link

bohendo commented May 21, 2020

I can reproduce this issue w 100% reliability w these steps:

  1. process 1 requests lock & then never unlocks
  2. process 2 requests lock then it's acquired once process 1's expires
  3. process 2 unlocks after <= the timeout & this error gets thrown.
    Logs from our locking service (using [email protected] & [email protected]) that demonstrates this:
2020-05-21T09:15:05.159Z [LockService] Acquiring lock for 0x055b7874f51964fD587969D6666474e99Ac4317F (TTL: 10000 ms)
2020-05-21T09:15:05.159Z [LockService] Acquired lock for 0x055b7874f51964fD587969D6666474e99Ac4317F after waiting 0 ms
2020-05-21T09:15:05.217Z [LockService] Waiting on lock for 0x055b7874f51964fD587969D6666474e99Ac4317F (locked: 0x055b..317F)
2020-05-21T09:15:15.176Z [LockService] Acquired lock for 0x055b7874f51964fD587969D6666474e99Ac4317F after waiting 9959 ms
2020-05-21T09:15:18.665Z [LockService] Releasing lock for 0x055b7874f51964fD587969D6666474e99Ac4317F after 3489 ms
2020-05-21T09:15:18.669Z [LockService] Error while unlocking 0x055b7874f51964fD587969D6666474e99Ac4317F: Unable to fully release the lock on resource "0x055b7874f51964fD587969D6666474e99Ac4317F".

@mike-marcacci
Copy link
Owner

@bohendo do you only experience this with these are separate processes? Following your algorithm does not produce any issues in my tests:

const Redlock = require('./redlock');
const IORedis = require('ioredis');

const client = new IORedis();
const redlock = new Redlock([client], {
  retryCount: 10,
  retryDelay: 100,
  retryJitter: 50
});


async function main (){
  console.log("a");
  await redlock.lock("regression", 200);
  console.log("b");
  const second = await redlock.lock("regression", 400);
  console.log("c");
  await second.unlock();
  console.log("d");
}

main().then(console.log, console.error);

And just to be 100% sure, there appear to be ~3.5 seconds in your logs between acquiring the lock and releasing it: I assume the lock duration is greater than this?

@stavalfi
Copy link

stavalfi commented Jun 16, 2020

same here. the error occurs sometimes but until now I only lock inside the same process. any solution? alternatives?

@nmargaritis
Copy link
Author

nmargaritis commented Jun 18, 2020

The only input I can provide is that, you should not simply increase the lock/unlock time/timeout blindly.

This is not really a solution and it can only mask other problems of the library or your system. If there is something wrong with your system, this can result in lock time exceptions. Either acquiring or releasing. For example, if your lengthy processes take more time than the total lock acquiring time, then any other attempt to acquire the lock will expire and throw an error.

Such cases should be considered as false positives as regards to this issue I initially opened. As they are related with misconfiguration or bugs of the system using the lock mechanism.

For my use-case, the system was tested; with each operation reaching a total lock time of 1-2 seconds as a maximum. After upgrading we started hitting the error, consistently. The only workaround so far was to roll back to redlock version 3.1.2.

I believe that there is something wrong with the changes introduced
here

@gnitsenko93
Copy link

@mike-marcacci Here is a script for the bug reproduction:

'use strict';

const Redis = require('ioredis');
const Redlock = require('redlock');

const TIMEOUT_SEC = 2000;
const ADDITIONAL_TIMEOUT = 100;
const KEY = 'key';

const redis = new Redis();
const redlock = new Redlock([redis]);

redlock.lock(KEY, TIMEOUT_SEC)
    .then(lock => {
        console.log(`resource is locked, key=[${KEY}]`);

        setTimeout(() => {
            lock.unlock()
                .then(() => {
                    console.log(`resource is unlocked, key=[${KEY}]`);
                    process.exit(0);
                })
                .catch(error => {
                    console.log(`error on resource unlocking, error=[${error}]`);
                    process.exit(1);
                })
            
        }, TIMEOUT_SEC + ADDITIONAL_TIMEOUT);
    });

You can try run it with different library tags and make sure it works for 3.1.2 but not for 4.1.0.

I personally used Redis server version 3.2.12 into a docker container and ioredis 4.14.2.

My suggestion here is that you have broken that functionality by changing Lua scripts for locking and unlocking.

Seems like a critical bug since an application code throws an error on an attempt to unlock a resource after a timeout has exceeded.

@gnitsenko93
Copy link

gnitsenko93 commented Jun 23, 2020

@mike-marcacci So the problem is that in 3.1.2:

  1. Lock script sets a value with an expiration timeout
  2. Unlock script deletes a value if it exists and returns 0 otherwise
  3. There is a check:
if(response === 0 || response === 1)	
				votes++;

that passes for a case if an unlocked value does not exist and incremented votes variable, so that the error was not thrown.

As for 4.1.0 the algorithm has been broken:
...
3. There is the changed condition:

if(response === resource.length || response === '' + resource.length)
				votes++;

that does not pass for the case.

@mike-marcacci
Copy link
Owner

@gnitsenko93 perhaps I'm missing something here (I'm still on my first coffee), but in your reproduction the lock has already expired and the keys have been evicted by redis... so calling unlock should error. This way your application can alert you to the fact that the lock expired and became re-acquirable before you unlocked it.

If this is the scenario that is being described in this issue, this is expected behavior in v4 as described in CHANGELOG.md. This would make sense given my inability to reproduce an issue that folks are clearly seeing in the wild.

Now, perhaps this isn't a scenario that you would consider an error (and initially I didn't either, given this library's earlier behavior). However, given that the purpose of this library is to guarantee exclusivity of a particular resource, I would consider it an error where the specified expiration breaks this guarantee.

@gnitsenko93
Copy link

@mike-marcacci Got ya. Closing my pull request.

@gnitsenko93
Copy link

@mike-marcacci Still there is a small inaccuracy in the changelog.

Change behavior of unlock to return an error if one or more entries cannot be removed.

Should be actually:

Change behavior of unlock to return an error if zero or more entries cannot be removed.

@SimenB
Copy link

SimenB commented Sep 11, 2020

Now, perhaps this isn't a scenario that you would consider an error (and initially I didn't either, given this library's earlier behavior). However, given that the purpose of this library is to guarantee exclusivity of a particular resource, I would consider it an error where the specified expiration breaks this guarantee.

@mike-marcacci could the error returned get more details? Right now it says Unable to fully release the lock on resource but if it's due to the resource being gone (or something else) it'd be great if the error said so instead of a more generic "failed".

(mostly asking since we're seeing these errors now when we didn't before, and more context on what is failing would help us narrow down the issue 🙂)

@ostar0816
Copy link

I think we could just catch that error and handle it not to propagate because it doesn't harm anything just like a warning. We just need to set ttl properly according to our needs and it makes sense that the lock is expired if ttl is passed.

@hisabimbola
Copy link

I started having this issue too. We have been on version 4 for a while but today out of no where, I started to get this error where unable to fully release the lock. I listened for expired event and do not see any event with the locking key.

@mike-marcacci mike-marcacci mentioned this issue Dec 12, 2020
9 tasks
@winnochan
Copy link

I now also see the bug in 4.2.0, I want to know which version should I use to avoid this bug?

@gregbarcza
Copy link

+1

@Bhautik0110
Copy link

Same issue when upgrading to the latest.

@Dakuan
Copy link

Dakuan commented Jul 25, 2021

does the new v5 fix this even you don't use the new using syntax?

@youkaisteve
Copy link

I have the same issue when acquire with a less ttl than task executing time.

@apurva1112
Copy link

apurva1112 commented Aug 25, 2021

This error occurs when we attempt to release an already expired lock.

A condition like below while releasing the lock would help

if (lock.expiration < Date.now() && lock.expiration != 0) {
  lock.unlock()
}

@Bhanuprathap
Copy link

When you received an error as "Unable to fully release the lock on resource" the operation after acquiring the lock took more time than mentioned in TTL in the lock argument. Try to increase the TTL value and verify it again.

@mike-marcacci
Copy link
Owner

Hi folks! I'm going to go ahead and close this, as the newest published versions are effectively full rewrites and any residual bugs from the former versions are unlikely to persist. It's still not entirely clear whether there is a but here at all or just expected behavior when a routine takes longer than the lock. Regardless, if anybody encounters this again, please open a new issue that we can constrain to a particular version.

@nexun
Copy link

nexun commented May 17, 2022

if (lock.expiration < Date.now() && lock.expiration != 0) {
  lock.unlock()
}

shouldn't it be lock.expiration > Date.now()?

@and-kost
Copy link

and-kost commented Nov 8, 2022

if (lock.expiration < Date.now() && lock.expiration != 0) {
  lock.unlock()
}

shouldn't it be lock.expiration > Date.now()?

I suppose It should be >

@dasbts
Copy link

dasbts commented Jul 5, 2023

Hi folks! I'm going to go ahead and close this, as the newest published versions are effectively full rewrites and any residual bugs from the former versions are unlikely to persist. It's still not entirely clear whether there is a but here at all or just expected behavior when a routine takes longer than the lock. Regardless, if anybody encounters this again, please open a new issue that we can constrain to a particular version.

Not true... :( Still encountering this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests