Skip to content

scan_iter family commands gives inconsistent result when using Sentinel connection pool #3197

@agnesnatasya

Description

@agnesnatasya

Version: What redis-py and what redis version is the issue happening on?
redis-py 4.5.0

Platform: What platform / version? (For example Python 3.5.1 on Windows 7 / Ubuntu 15.10 / Azure)
Python 3.10

Description: Description of your issue, stack traces from errors and code that reproduces the issue

scan_iter family commands (scan_iter, sscan_iter, hscan_iter, zscan_iter) might give inconsistent result when the client is created using a connection pool, and when there are multiple concurrent requests.

Assume we have this setup

  • 2 replicas, host A and host B
  • use SentinelConnectionPool to manage connections to different server
  • 2 concurrent scan_iter commands, in which each will issue multiple scan commands. scan commands issued by these scan_iter commands are labelled scan (1) and scan (2) below.

What might happen is:

  1. scan (1) is issued
  2. scan (1) gets connection from the pool
    • The pool is empty so it creates a new connection
    • For sentinel connection pool, creating a new connection means getting the next replica in the connection_pool.rotate_slaves rotation.
    • Since this can return any replicas on rotation, let's say it arbitrarily connects to host A
  3. scan (1) executed at host A
  4. scan (2) is issued in the meantime
  5. scan (2) gets connection from the pool
    • The pool is empty (there was 1 connection created but it's still in use)so it creates a new connection
    • Get the next replica in the connection_pool.rotate_slaves rotation.
    • Since this can return any replicas on rotation, let's say it arbitrarily connects to host B
  6. scan (2) executed on host B
  7. scan (1) is finished. Connection to host A is put back to the pool
  8. scan (2) is finished. Connection to host B is put back to the pool
  9. scan (1) gets connection from connection pool, it gets the connection to host B (since connection pool will just pop() the last element from the available connections)
  10. scan (1) is executed on host B

Step 9 is the bug. All scan commands coming from the same scan_iter command needs to go to the same replica. This is because the 'state' of the scan_iter command is stored in the cursor and different replicas will store keys in a different order.
Hence, if we use the cursor from host A to do a scan on host B, we'll get an inconsistent result.

There are 3 different base implementations of a connection pool, ConnectionPool, SentinelConnectionPool and BlockingConnectionPool. All of them does something similar when getting a new connection from the pool. It creates a 'dummy' connection object, and call connection.connect(), which will actually connect to the intended replica.

There are 4 different implementations of a connection, Connection, SSLConnection, SentinelManagedConnection, and SentinelManagedSSLConnection.

  • For SentinelManagedConnection and SentinelManagedSSLConnection, this is fixable by making SentinelConnectionPool maintaining an id of the scan iter command to the host it has previously issued command to
  • For Connection and SSLConnection, connection.connect(), will depend on the impl of the connection class' .connect but by default will connect to self.host and self.port of the connection.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions