Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout migrations that take too long to run #11704

Merged
merged 3 commits into from
Jun 28, 2022

Conversation

dstufft
Copy link
Member

@dstufft dstufft commented Jun 27, 2022

This will prevent migrations that take too long from locking up the site.

We might also want to set timeouts on the site itself, but I'll leave that to another PR.

This should be override-able on a per migration basis by issuing the same SET as part of the migration, but then it's obvious that is happening.

The lock_timeout is how long a migration can sit there idle waiting on a lock (this will prevent other long running txns from blocking this txn, then having this txn block other txns).

The statement_timeout will prevent any individual statement from taking more than 5s.

This failing ends up looking like this:

$ docker-compose run web python -m warehouse db upgrade head
Creating warehouse_web_run ... done
{"logger": "alembic.runtime.migration", "level": "INFO", "event": "Context impl PostgresqlImpl.", "thread": 139981115451200}
{"logger": "alembic.runtime.migration", "level": "INFO", "event": "Will assume transactional DDL.", "thread": 139981115451200}
{"logger": "alembic.runtime.migration", "level": "INFO", "event": "Running upgrade 8bee9c119e41 -> a09fe6af295f, Migrate Existing Data for Release.is_prerelease", "thread": 139981115451200}
Traceback (most recent call last):
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1819, in _execute_context
    self.dialect.do_execute(
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.QueryCanceled: canceling statement due to statement timeout


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/warehouse/src/warehouse/cli/db/__init__.py", line 31, in alembic_lock
    yield alembic_config
  File "/opt/warehouse/src/warehouse/cli/db/upgrade.py", line 29, in upgrade
    alembic.command.upgrade(alembic_config, revision, **kwargs)
  File "/opt/warehouse/lib/python3.10/site-packages/alembic/command.py", line 322, in upgrade
    script.run_env()
  File "/opt/warehouse/lib/python3.10/site-packages/alembic/script/base.py", line 569, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/opt/warehouse/lib/python3.10/site-packages/alembic/util/pyfiles.py", line 94, in load_python_file
    module = load_module_py(module_id, path)
  File "/opt/warehouse/lib/python3.10/site-packages/alembic/util/pyfiles.py", line 110, in load_module_py
    spec.loader.exec_module(module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/opt/warehouse/src/warehouse/migrations/env.py", line 68, in <module>
    run_migrations_online()
  File "/opt/warehouse/src/warehouse/migrations/env.py", line 62, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/opt/warehouse/lib/python3.10/site-packages/alembic/runtime/environment.py", line 853, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/opt/warehouse/lib/python3.10/site-packages/alembic/runtime/migration.py", line 623, in run_migrations
    step.migration_fn(**kw)
  File "/opt/warehouse/src/warehouse/migrations/versions/a09fe6af295f_migrate_existing_data_for_release_is_.py", line 39, in upgrade
    op.execute("SELECT pg_sleep(10)")
  File "<string>", line 8, in execute
  File "<string>", line 3, in execute
  File "/opt/warehouse/lib/python3.10/site-packages/alembic/operations/ops.py", line 2414, in execute
    return operations.invoke(op)
  File "/opt/warehouse/lib/python3.10/site-packages/alembic/operations/base.py", line 394, in invoke
    return fn(self, operation)
  File "/opt/warehouse/lib/python3.10/site-packages/alembic/operations/toimpl.py", line 207, in execute_sql
    operations.migration_context.impl.execute(
  File "/opt/warehouse/lib/python3.10/site-packages/alembic/ddl/impl.py", line 202, in execute
    self._exec(sql, execution_options)
  File "/opt/warehouse/lib/python3.10/site-packages/alembic/ddl/impl.py", line 195, in _exec
    return conn.execute(construct, multiparams)
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1306, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1498, in _execute_clauseelement
    ret = self._execute_context(
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1862, in _execute_context
    self._handle_dbapi_exception(
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2043, in _handle_dbapi_exception
    util.raise_(
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1819, in _execute_context
    self.dialect.do_execute(
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.QueryCanceled) canceling statement due to statement timeout

[SQL: SELECT pg_sleep(10)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1819, in _execute_context
    self.dialect.do_execute(
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.InFailedSqlTransaction: current transaction is aborted, commands ignored until end of transaction block


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/warehouse/src/warehouse/__main__.py", line 18, in <module>
    sys.exit(warehouse())
  File "/opt/warehouse/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/warehouse/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/warehouse/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/warehouse/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/warehouse/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/warehouse/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/warehouse/lib/python3.10/site-packages/click/decorators.py", line 38, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/opt/warehouse/src/warehouse/cli/db/upgrade.py", line 26, in upgrade
    with alembic_lock(
  File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/opt/warehouse/src/warehouse/cli/db/__init__.py", line 34, in alembic_lock
    connection.execute("SELECT pg_advisory_unlock(hashtext('alembic'))")
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1291, in execute
    return self._exec_driver_sql(
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1595, in _exec_driver_sql
    ret = self._execute_context(
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1862, in _execute_context
    self._handle_dbapi_exception(
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2043, in _handle_dbapi_exception
    util.raise_(
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 208, in raise_
    raise exception
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1819, in _execute_context
    self.dialect.do_execute(
  File "/opt/warehouse/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.InternalError: (psycopg2.errors.InFailedSqlTransaction) current transaction is aborted, commands ignored until end of transaction block

[SQL: SELECT pg_advisory_unlock(hashtext('alembic'))]
(Background on this error at: https://sqlalche.me/e/14/2j85)
ERROR: 1

@dstufft dstufft requested a review from ewdurbin June 27, 2022 23:01
@dstufft
Copy link
Member Author

dstufft commented Jun 28, 2022

I just want to explicitly call this out:

With this PR, Migrations will FAIL if they sit waiting on a lock for more than 4s.

With this PR, Migrations will FAIL if any individual statement takes more than 5s.

Copy link
Member

@ewdurbin ewdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this rather simple and direct solution! As a default value these are stellar, and the ease of overriding is great.

Can we add docs for how to manage these on a per migration basis to the development/database-migrations doc and to the migration template?

Once that's complete, this is 👍🏼

@ewdurbin
Copy link
Member

Oh oh oh! Missed your comment while writing mine. Add those links to the migrations docs too 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants