-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jedis returning wrong values after long command execution #909
Comments
I closed the original redis issue because it seems it is not related to redis but to Jedis (about to upload a simple case to reproduce it) |
It would be nice if you could provide us a small test case about this problem. I know that I'll try to take a look at it. If you could provide a test case it would be easier 😄 |
We reproduced the issue just with GET commands on a single redis instance. About to upload sample test code that reproduces the issue. |
Ok, thank you. I'll wait for this. |
This is simple groovy code that reproduces the issue: package main.groovy import redis.clients.jedis.* import java.util.concurrent.* static main(args) {
// This populates the DB // Now we reproduce the issue
} After it populates the DB, it spawns threads and get foo:number. In another window, we call from redis-cli a DEBUG SLEEP 10. The code throws SocketTimeoutException. As soon as the sleep ends the get commands return wrong values. This happens even with just one thread. |
Sorry about the groovy code, but it is the fastest thing a coworker could whip to reproduce the problem. Also, I don't know if it helps, but the jedis instances that fail have a broken instance var set to true. |
When you got some error with the Could you please try again with this fix? |
Thanks a lot! that was it! Where is this in the docs? |
I believe there is nothing documented about But anyway I believe we should thrown an exception when the What do you think about throwing this exception @HeartSaVioR @marcosnils @xetorthio? |
About Exception, handling JedisConnectionException would be enough since Jedis transforms IOException to JedisConnectionException. About documentation, sure, here it is. Since Jedis implements Closeable, we're fine to call Jedis.close() from finally statement or just use try-with-resource. Hope this helps. ps. Maybe it's better to add more detailed description to wiki. ;) |
@HeartSaVioR I see your point. But because Jedis is an "API" consumed by many developers it would be nice to create a "defensive programming" in this case. We know that users could close it in a wrong way (not using Closeable or using the wrong method), so it would be nice trying to avoid it or at least warning it. Maybe instead of throwing a exception if the |
I ran into the problem using just a Jedis instance (no pool). Since my code is single threaded (using actors), I thought that I didn't need to use a pool. Even now, I can reproduce the issue without a pool (just try the sample code without ), so it doesn't seem right to require to solve it at the pool level. Since Jedis already handles reconnection well, why doesn't it reconnect whenever it enters the broken state? What's the point of allowing to make calls from a connection when you already flagged it as broken? Just my 0,2. |
We should definitely start by improving the wiki as it doesn't say anywhere that within an exception users need to use Regarding @mpaoletta comments Jedis shouldn't retrieve wrong values even in broken state as that's a very terrible thing. I'll make some quick tests and hopefully we can altogether come with the optimal solution |
Maybe I don't have enough experiences about Socket Programming. Could anyone explain me how we can be fine to use timed-out socket's InputStream? Btw, reconnect strategy stated by @mpaoletta seems to be good. |
I'd like to add some opinions about your issue, with another issue you posted to Redis repo.
ps. I remove 1. as you already understand Redis behavior. I was misleading issue from your Redis repo. Sorry about it. ;) |
@nykolaslima Btw, I'm afraid that Jedis users are already familiar with returnResponse / returnBrokenResponse. Removing returnBrokenResopnse will break backward compatibility. And I don't know it's better to let JedisPool asks Jedis for its state. |
Updated wiki page : https://github.com/xetorthio/jedis/wiki/Getting-started Btw, we need to rewrite wiki pages since we don't need to explain replication from Getting-started. |
@HeartSaVioR Actually I believe that having returnResoure and returnBrokenResource is unnecessary. It's too easy to make a mistake and don't use the IMO, users should not know that when resource is broken, something different has to been done. I believe that it should be the framework responsability to decide when call About breaking compatible, I think it's not an issue. We are planning the About the automatically reconnection, I think it would be great too! |
I also think like @nykolaslima. We already support Regarding reconnections Jedis already supports a very basic reconnection scheme but there's lots of cases that haven't been tested correctly.
@HeartSaVioR I don't have a lot of experience in Socket programming either. But if we make sure that whenever there's a connection error we just clear any left buffers upon reconnection we shouldn't see the errors @mpaoletta is seeting like getting wrong values form a request. I'm not sure about this though, it's just an assumption. That's why I need to run further tests to see how this wan be improved. Suggestions and ideas are more than welcome 😄 |
Just a few comments: We don't usually use the DEBUG SLEEP and KEYS commands on production, they were our last resort trying to reproduce the issue, we were stressing the app for 3 hours without it breaking a sweat. The fact remains that stop-the-world events on Redis, either because of long transactions or other things can make Jedis return wrong values. I understand that there is already a solution with returnBrokenResource, and I believe that it could be handled in a nicer way from the API side. But what I truly believe that if you already know the connection is in broken state, you should at least throw some kind of exception when performing any operation on the Jedis instance, because returning wrong values just because someone isn't using the approved usage pattern is just wrong. I'm ok with Jedis throwing exceptions all over the place or having slow performance because of bad configuration, I can work with that, it's something that will get my attention. But having someone getting someone else's credit card statement is really hard to justify. Also, I wasn't using try-with-resource because we are using Scala where it's not available and my first attempt to reproduce it wasn't very nice (and since I was fixing a critical production issue, it had to be done very quick). And I wasn't using a pool in the first place, because I'm creating a connection from the actor instance, to avoid contention when getting the connection from the pool. So fixing this at the pool level IMHO is not correct, because it makes the Jedis instance unusable by itself. Hope this helps. |
We cannot assume that everybody will use try-with-resources. In my opinion we should do two things:
|
@nykolaslima @mpaoletta when I mentioned that Jedis supports
@nykolaslima is there a reason we should keep the returnResource in JedisPool?. I think it's best to only have one way of handling proper cleanup and as I see it we can just use the
I totally agree. I think we can work in two fronts for this issue.
|
@marcosnils Btw, maybe we could get rid of broken state instead of keeping it. Please correct me if I'm wrong. :) |
@HeartSaVioR I don't think if I understand you. We are already throwing exception but the problem is that I believe we all agree about deprecating |
I mean why we leave Jedis instance to invalid state though we can restore state ASAP by disconnecting (or maybe reconnecting). @xetorthio @marcosnils @nykolaslima |
I agree with that Heart! +1
|
Found another hurdle. ;( Connection layer would be easy, as I stated earlier. Another chance of applying command pattern though it's not good at memory consumption and performance? |
Actually we need to add precondition to most of methods when we decide to throw Exception if Jedis is broken. |
@HeartSaVioR you're correct. By removing @nykolaslima what do you think about this?, I've tried to explained it in my previous comments but maybe I haven't transmitted the correct idea. Regarding reconnections what about overriding the The method in BinaryJedis would be something like this: @Override
protected Connection sendCommand(Command cmd) {
try {
return super.sendCommand(cmd);
} catch (JedisConnectionException jce) {
if (isInMulti || isInWatch) {
resetState();
}
throw jce;
}
} |
@marcosnils I still don't understand why we are wanting to keep the
I believe we could have the (I don't know if I'm expressing my idea right here, I'm sorry about my bad english) What is the reason for keeping the |
@marcosnils @nykolaslima If we agree that, hiding JedisPool's returnXXXResource() to users is the only issue for us. |
@nykolaslima I'm becoming dizzy here. I've never said that we should keep the @HeartSaVioR I see what you're saying. Another idea that I have is that we can pass a reference of the Thoughts? |
@marcosnils I'm sorry, I understood it wrong. Now I understand what you are saying. So we all agree that we should make I'm not sure if we need to reset the state if something goes wrong with |
@nykolaslima what if you're not using JedisPool and you have a simple program that deals with just one redis instance?. In that case if you get an exception then you need to re-instantiate Jedis to continue working. Nowadays we're just flagging the connection as broken and we only use that for resource cleanup. As @HeartSaVioR said, I like the idea about removing broken state at all and allowing a Jedis instance to recover itself by reconnecting. Whenever a user tries to execute a command on a disconnected socket we can through an exception untl the client reconnects. What do you think? |
Hmm, command pattern or cross reference. Both things are not really good though. ;( This issue is no longer simple, so let's resolve this issue to iterative way. Since we're discussing from closed issue, how about creating new issue and migrate our conversation? |
I agree with you Heart. Let's do it by parts and open a new issue to discuss it! |
To improve usability we're changing the visibility of `returnResoure` and `returnBrokenResource` in favor of `jedis.close()` which automatically returns the resource to the pool accordingly. As `Pool` and `Jedis` are currently in different packages i've created a JedisPoolAbstract class to provide a bridge between the two implementations
I've opened two PR's to fix the |
Great @marcosnils 🎉 |
Conflicts: src/main/java/redis/clients/jedis/BinaryJedis.java src/main/java/redis/clients/jedis/BinaryShardedJedis.java
We're having a severe app/driver bug on our heavily concurrent app.
This is a single redis installation with over 4M keys. In order to reproduce the problem, we execute a long running command from redis-cli, like a KEYS * or a DEBUG SLEEP (this is just to reproduce the problem), our client code (Jedis based) starts to throw errors, first socket read timeouts, which are expected and then it throws typical concurrency errors in Jedis, like trying to cast a string into a binary. The worst scenario is that sometimes it seems that it returns wrong keys. These errors keep occurring even after the long running command finishes.
We were using a Jedis instance in each akka actor instance, so it shouldn't be a concurrency issue in our code (we were fairly sure that the redis connection wasn't shared between threads) but we changed to JedisPool nevertheless. But the problem remained (we are using Jedis 2.6.2).
The usual pattern is (this is Scala):
We use the following commands:
get, exists, hgetAll, getSet, expire, ttl, multi/exec, set, del.
Are you aware of any concurrency issue on any of these commands?
We are currently evaluating other drivers, but we would really appreciate any tips regarding this.
TIA,
Martin Paoletta
The text was updated successfully, but these errors were encountered: