Skip to content

LG-228 Make migrations safer and more resilient#2127

Merged
monfresh merged 1 commit intomasterfrom
mb-statement-timeout
May 1, 2018
Merged

LG-228 Make migrations safer and more resilient#2127
monfresh merged 1 commit intomasterfrom
mb-statement-timeout

Conversation

@monfresh
Copy link
Contributor

Why: We recently ran into an issue while deploying RC 56 to
production due to a migration needing more time than allowed by our
statement_timeout.

How:

Hi! Before submitting your PR for review, and/or before merging it, please
go through the following checklist:

  • For DB changes, check for missing indexes, check to see if the changes
    affect other apps (such as the dashboard), make sure the DB columns in the
    various environments are properly populated, coordinate with devops, plan
    migrations in separate steps.

  • For route changes, make sure GET requests don't change state or result in
    destructive behavior. GET requests should only result in information being
    read, not written.

  • For encryption changes, make sure it is compatible with data that was
    encrypted with the old code.

  • For secrets changes, make sure to update the S3 secrets bucket with the
    new configs in all environments.

  • Do not disable Rubocop or Reek offenses unless you are absolutely sure
    they are false positives. If you're not sure how to fix the offense, please
    ask a teammate.

  • When reading data, write tests for nil values, empty strings,
    and invalid formats.

  • When calling redirect_to in a controller, use _url, not _path.

  • When adding user data to the session, use the user_session helper
    instead of the session helper so the data does not persist beyond the user's
    session.

  • When adding a new controller that requires the user to be fully
    authenticated, make sure to add before_action :confirm_two_factor_authenticated.

@monfresh
Copy link
Contributor Author

I tested this locally by updating my local Figaro config:

make fast_setup

then creating a new migration file:

rails g migration StatementTimeoutTest

then copying this into it:

class StatementTimeoutTest < ActiveRecord::Migration[5.1]
  def change
    safety_assured { execute "SELECT pg_sleep(5);" }
  end
end

then running bundle exec rake db:migrate and watching it crash due to the statement_timeout being less than 5 seconds.

Then, I ran:

export MIGRATION_STATEMENT_TIMEOUT=6000
bundle exec rake db:migrate

and the migration passed.

@monfresh monfresh force-pushed the mb-statement-timeout branch from 1fbfa50 to 2832d1a Compare April 27, 2018 18:34
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to test this. Alternatively, to avoid Code Climate from complaining, we can use the slowpoke gem instead (which is where I got this code from), and only require this module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, that won't work because the gem depends on a method in the main class that we don't want to require. So, our options are to tell Code Climate to ignore this file for test coverage or to build our own gem. I will try the former.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@monfresh
Copy link
Contributor Author

Note that this can't be merged until the infrastructure is updated to call the migration from here.

@monfresh monfresh requested a review from davemcorwin April 30, 2018 14:25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My one thought here is that it would be nice to give this a more descriptive name, perhaps something like Deploy::MigrationDatabaseTimeout so that it is more clear what's being done in the initializer where we are using it.

Copy link
Contributor

@jgsmith-usds jgsmith-usds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me. My only concern is to make sure the timeout is long enough for reasonable migrations. I fear we'll run into it eventually and be surprised again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add validation somewhere that database_statement_timeout is present? Looks like if it's nil this will set the timeout to 0, not sure what effect that will have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Will add it to figaro.rb.

@brodygov
Copy link
Contributor

Thanks, looks like this would've avoided last week's issue.

Why is this blocked on the identity-devops PR? I would've thought this would just not be executed until that one is merged.

@brodygov
Copy link
Contributor

It's also worth noting that this doesn't address the race condition or rollback problems described in https://github.com/18F/identity-private/issues/2344, but it's a step in the right direction.

@monfresh
Copy link
Contributor Author

Do we think 60 seconds is a big enough statement timeout for migrations?

@monfresh monfresh force-pushed the mb-statement-timeout branch 2 times, most recently from 57aee08 to 6950960 Compare May 1, 2018 01:38
**Why**: We recently ran into an issue while deploying RC 56 to
production due to a migration needing more time than allowed by our
`statement_timeout`.

**How**:
- Add the `strong_migrations` gem, which will check your migrations
for unsafe usage and best practices. Read more here:
https://github.com/ankane/strong_migrations
- Allow configuring the statement_timeout in `database.yml` via Figaro
- Allow overriding the statement_timeout via an ENV var for migrations.
Code shamelessly copied from the Slowpoke gem:
https://github.com/ankane/slowpoke
- Add a `deploy/migrate` script so that it is maintained in this repo
as opposed to the devops repo.
@monfresh monfresh force-pushed the mb-statement-timeout branch from 6950960 to 1713043 Compare May 1, 2018 02:24
@monfresh monfresh changed the title Make migrations safer and more resilient LG-228 Make migrations safer and more resilient May 1, 2018
@monfresh monfresh merged commit 98febec into master May 1, 2018
@monfresh monfresh deleted the mb-statement-timeout branch May 1, 2018 02:37
@amoose amoose mentioned this pull request Jun 26, 2019
13 tasks
@amoose amoose mentioned this pull request Jul 3, 2019
13 tasks
@amoose amoose mentioned this pull request Jul 8, 2019
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants