Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool auto reconnect #132

Closed
offline opened this issue Dec 15, 2016 · 14 comments
Closed

Pool auto reconnect #132

offline opened this issue Dec 15, 2016 · 14 comments

Comments

@offline
Copy link

offline commented Dec 15, 2016

Can we replace the current connection in pool with a new one on error "Lost connection to MySQL server during query ([Errno 104] Connection reset by peer)"?

@jettify
Copy link
Member

jettify commented Dec 15, 2016

Do not follow, what is your use case? you mean tolerate server reboot?

@offline
Copy link
Author

offline commented Dec 15, 2016

Because of timeout, network issues, mysql restart or other things that can happen, client loose connection, and when client tries to execute query - it fails. Usually driver replace the connection (for example in torndb).

@jettify
Copy link
Member

jettify commented Dec 15, 2016

Could you provide failing test?

We already have logic that checks connection before returning it to there user see: https://github.com/aio-libs/aiomysql/blob/master/aiomysql/pool.py#L147-L158

Also test for timeout case:

aiomysql/tests/test_pool.py

Lines 439 to 459 in 93aa3e5

@pytest.mark.run_loop
def test_drop_connection_if_timedout(pool_creator, connection_creator, loop):
conn = yield from connection_creator()
yield from _set_global_conn_timeout(conn, 2)
yield from conn.ensure_closed()
try:
pool = yield from pool_creator(minsize=3, maxsize=3)
# sleep, more then connection timeout
yield from asyncio.sleep(3, loop=loop)
conn = yield from pool.acquire()
cur = yield from conn.cursor()
# query should not throw exception OperationalError
yield from cur.execute('SELECT 1;')
pool.release(conn)
pool.close()
yield from pool.wait_closed()
finally:
# setup default timeouts
conn = yield from connection_creator()
yield from _set_global_conn_timeout(conn, 28800)
yield from conn.ensure_closed()

@offline
Copy link
Author

offline commented Dec 15, 2016

Maybe timeout is handled properly, but other exceptions are not? Here is an traceback:

Traceback (most recent call last):
  File "/var/www/moments-aiohttp/.env/lib/python3.5/site-packages/aiomysql/connection.py", line 558, in _read_bytes
    data = yield from self._reader.readexactly(num_bytes)
  File "/usr/lib/python3.5/asyncio/streams.py", line 656, in readexactly
    raise self._exception
  File "/var/www/moments-aiohttp/.env/lib/python3.5/site-packages/aiomysql/connection.py", line 558, in _read_bytes
    data = yield from self._reader.readexactly(num_bytes)
  File "/usr/lib/python3.5/asyncio/streams.py", line 670, in readexactly
    block = yield from self.read(n)
  File "/usr/lib/python3.5/asyncio/streams.py", line 627, in read
    yield from self._wait_for_data('read')
  File "/usr/lib/python3.5/asyncio/streams.py", line 457, in _wait_for_data
    yield from self._waiter
  File "/usr/lib/python3.5/asyncio/futures.py", line 361, in __iter__
    yield self  # This tells Task to wait for completion.
  File "/usr/lib/python3.5/asyncio/tasks.py", line 296, in _wakeup
    future.result()
  File "/usr/lib/python3.5/asyncio/futures.py", line 274, in result
    raise self._exception
  File "/usr/lib/python3.5/asyncio/selector_events.py", line 669, in _read_ready
    data = self._sock.recv(self.max_size)
ConnectionResetError: [Errno 104] Connection reset by peer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/www/moments-aiohttp/.env/lib/python3.5/site-packages/aiohttp/server.py", line 265, in start
    yield from self.handle_request(message, payload)
  File "/var/www/moments-aiohttp/.env/lib/python3.5/site-packages/aiohttp/web.py", line 96, in handle_request
    resp = yield from handler(request)
  File "/var/www/moments-aiohttp/.env/lib/python3.5/site-packages/aiohttp/web_urldispatcher.py", line 658, in __iter__
    resp = yield from method()
  File "/var/www/moments-aiohttp/handlers/moments.py", line 24, in get
    """, (aid, datetime.datetime.utcnow()))
  File "/var/www/moments-aiohttp/.env/lib/python3.5/site-packages/aiomysql/cursors.py", line 239, in execute
    yield from self._query(query)
  File "/var/www/moments-aiohttp/.env/lib/python3.5/site-packages/aiomysql/cursors.py", line 460, in _query
    yield from conn.query(q)
  File "/var/www/moments-aiohttp/.env/lib/python3.5/site-packages/aiomysql/connection.py", line 398, in query
    yield from self._read_query_result(unbuffered=unbuffered)
  File "/var/www/moments-aiohttp/.env/lib/python3.5/site-packages/aiomysql/connection.py", line 582, in _read_query_result
    yield from result.read()
  File "/var/www/moments-aiohttp/.env/lib/python3.5/site-packages/aiomysql/connection.py", line 835, in read
    first_packet = yield from self.connection._read_packet()
  File "/var/www/moments-aiohttp/.env/lib/python3.5/site-packages/aiomysql/connection.py", line 520, in _read_packet
    packet_header = yield from self._read_bytes(4)
  File "/var/www/moments-aiohttp/.env/lib/python3.5/site-packages/aiomysql/connection.py", line 564, in _read_bytes
    raise OperationalError(2013, msg) from e
pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query ([Errno 104] Connection reset by peer)')

@jettify
Copy link
Member

jettify commented Dec 15, 2016

Could you provide failing test case (also versions of aiomysql/python) so I can reproduce and debug issue? Right now it is not clear where is problem, even if it is in pool at all.

As for your request, current pool do not support reconnection feature, we can make it more robust and more tolerant, but reconnection should live in separate pool implementation or helper function.

@offline
Copy link
Author

offline commented Dec 22, 2016

I was unable to create a test for it, but it seems to happen only with Amazon RDS. I guess they don't close inactive connections properly.

@jettify
Copy link
Member

jettify commented Dec 22, 2016

thanks for update!

Probably to make pool more robust, we also can add connection recycling, after specified time, just close connection and create new one instead.

@offline
Copy link
Author

offline commented Dec 22, 2016

Good idea

@eth3lbert
Copy link

Agree, need a pool_recycle in create_engine like sqlalchemy to handle rds disconnect.

@wencan
Copy link

wencan commented Apr 1, 2017

pool_recycle
+1

@wencan
Copy link

wencan commented Apr 6, 2017

@offline
My temporary solution:

engine = None
_task = None

async def init_engine(loop=None):
    # create engine

    global _task
    _task = loop.create_task(keep_engine())


async def close_engine():
    _task.cancel()

    # close engine


async def keep_engine():
    while True:
        async with engine.acquire() as conn:
            await conn.connection.ping()

        await asyncio.sleep(60)

@tzongw
Copy link

tzongw commented Oct 14, 2017

For connection lost, wonder if StreamReader.at_eof() work? Maybe also check StreamReader.exception()

aiomysql/aiomysql/pool.py

Lines 147 to 158 in e15115d

def _fill_free_pool(self, override_min):
# iterate over free connections and remove timeouted ones
free_size = len(self._free)
n = 0
while n < free_size:
conn = self._free[-1]
if conn._reader.at_eof():
self._free.pop()
conn.close()
else:
self._free.rotate()
n += 1

@jettify
Copy link
Member

jettify commented Oct 15, 2017

I created PR with pool recycle here #216

@jettify jettify closed this as completed Feb 20, 2018
@djetelina
Copy link
Contributor

djetelina commented Feb 28, 2018

Hey, I've stumbled upon the a similiar issue that concerns pool reconnect, except I wasn't using Amazon RDS, but our own nginx proxy in front of a galera cluster.

I saw very strange behavior, when I disconnected one of the nodes behind the proxy, where my benchmarks went from 720 requests per second to 4, without any errors going through and never returning to normal unless I started that node back up, or restarted the pool. I've tried pretty much every option available in aiomysql and trying to set connection's last_used manually in a finally after execute and nothing worked.

The fix was to include proxy_timeout 3s; in the proxy, but this is considered a 'hack' by our Ops and it's not something we want to keep in there long-term. I don't know where to start to provide a failing test for this, but if anyone finds the time it's definitely worth investigating. I already spent almost a week on this and I have to move on for now. Plus I don't understand the readers and writes well enough :).

Proxy config:

daemon                    off;
worker_processes          50;
error_log                 /dev/stderr;
pid                       /tmp/nginx.pid;
worker_rlimit_nofile      8192;
events {
    use epoll;
    worker_connections  4096;
    accept_mutex off;
}

stream {

    log_format basic '$remote_addr [$time_local] '
                     '$protocol $status $bytes_sent $bytes_received '
                     '$session_time';

    access_log       /dev/stdout basic;

    upstream read {
        hash $remote_port consistent;
        server node1:3306 max_fails=3 fail_timeout=5s;
        server node2:3306 max_fails=3 fail_timeout=5s;
        server node3:3306 max_fails=3 fail_timeout=5s;
    }


    server {
        listen 3307;
        proxy_pass read;
        proxy_timeout 3s;   # This fixes the issue but shouldn't be present here
        proxy_connect_timeout 300ms; # detect failure quickly
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants