Skip to content

Add add_column_safely and backfill_column_safely migration helpers#105

Closed
fatkodima wants to merge 1 commit intoankane:masterfrom
fatkodima:add_column_with_default_safely
Closed

Add add_column_safely and backfill_column_safely migration helpers#105
fatkodima wants to merge 1 commit intoankane:masterfrom
fatkodima:add_column_with_default_safely

Conversation

@fatkodima
Copy link
Contributor

No description provided.

@fatkodima fatkodima force-pushed the add_column_with_default_safely branch 2 times, most recently from e50e382 to 8bb6bd3 Compare December 22, 2019 11:55
@fatkodima fatkodima changed the title Add add_column_with_default_safely and update_column_in_batches migration helpers Add add_column_safely and backfill_column_safely migration helpers Dec 24, 2019
@fatkodima fatkodima force-pushed the add_column_with_default_safely branch from 8bb6bd3 to 724bafa Compare December 30, 2019 23:45
start_pk = result.first[primary_key]

loop do
finish_arel = table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also need to add .where(table[column_name].not_eq(value)) to avoid reprocessing rows during a retry? Same question for the actual update a few lines down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. This method updates rows in primary key order, so when restarting, the whole process will start from first (start_pk) unmatched row, which was last when update crashes, and move forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true that the process will pickup where it left off before the restart but won't it unnecessarily update rows that were created after the column default was set but before the backfill process completes (i.e. rows were the DB filled in the column default on insert)? Might not be much of an issue in practice though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not be much of an issue in practice though.

Actually, this.

Copy link
Owner

@ankane ankane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @fatkodima, thanks for this and the other PRs that follow it. Will hold off on reviewing dependent ones until we get this merged.

Overall, looks great! I still need to dig into the Arel. I was hoping there'd to be a good way to dynamically create a model and use the existing code from the readme, but I'm not sure there's a good way to do this while reusing the existing connection.

module StrongMigrations
module Util
def connection
ActiveRecord::Base.connection
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to use the migration connection rather than the ActiveRecord::Base connection.


class AddColumnSafely < TestMigration
def change
add_column_safely :users, :nice, :boolean, default: true, null: false
Copy link
Owner

@ankane ankane Jan 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a different name for the new column to avoid needing to change the AddColumnDefaultSafe migration (keeping the changeset minimal)

end
end

def backfill_column_safely(table_name, column_name, value, batch_size: 1000)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove batch size to keep things simple. Also, default should probably be 10,000 like with existing instructions.

end

def add_column_safely(table_name, column_name, type, **options)
ensure_postgresql(__method__)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we can run the default add_column method when it's not Postgres.

reversible do |dir|
dir.up do
transaction do
add_column(table_name, column_name, type, default: nil, **options) unless connection.column_exists?(table_name, column_name, type)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we should check for column_exists?, as the migration should fail if a user tries to add an existing column.

change_column_default(table_name, column_name, default)
end

default_after_type_cast = connection.type_cast(default)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for typecasting?


start_arel = table
.project(table[primary_key])
.where(table[column_name].not_eq(value))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backfilling should only affect NULL columns. Would be good to include a test case for this situation (can do after initial PR if that's easier)

add_column_default:
"Adding a column with a non-null default causes the entire table to be rewritten.
Instead, add the column without a default value, then change the default.
Instead, add the column without a default value, backfill and then change the default.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default gets changed before backfilling.

end

safety_assured { execute(update_arel.to_sql) }

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to throttle - sleep(0.01) (may need a global option for this later since more powerful databases may need less throttling, but let's deal with that later)

@fatkodima fatkodima force-pushed the add_column_with_default_safely branch from 724bafa to d41728c Compare January 4, 2020 23:58
@fatkodima
Copy link
Contributor Author

@ankane Thanks for review! Updated with your suggestions.

@fatkodima fatkodima force-pushed the add_column_with_default_safely branch 2 times, most recently from 0cb518e to 63da732 Compare January 5, 2020 13:42
@fatkodima
Copy link
Contributor Author

Updated with master.

@fatkodima fatkodima force-pushed the add_column_with_default_safely branch from 63da732 to bc50b5a Compare February 16, 2020 11:52
@fatkodima
Copy link
Contributor Author

@ankane would you like making another round of reviews on this?

@ankane
Copy link
Owner

ankane commented Mar 2, 2020

Hey @fatkodima, sorry for the delay. Still thinking on #111.

@ankane ankane added the waiting label May 12, 2020
@ankane
Copy link
Owner

ankane commented May 12, 2020

Decided helper methods are better as a separate gem for now - see #111

@ankane ankane closed this May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants