Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement teardown for gh-ost library #526

Merged
merged 25 commits into from
Jan 14, 2018
Merged

Implement teardown for gh-ost library #526

merged 25 commits into from
Jan 14, 2018

Conversation

shlomi-noach
Copy link
Contributor

Re-PR for #479 by @nikhilmat , required for testing.

Reiterating original PR text:

This PR adds explicit cleanup to gh-ost for when the library is consumed by another application and might be executed multiple times (for example, by a webserver). This stops any infinite goroutines that have started once the migration has finished, closes any DB connections it has opened, and ensures that the migration context that is passed around is not being globally accessed - instead it instantiates a new one for each migration run and passes it around.

@shlomi-noach shlomi-noach temporarily deployed to production/mysql_role=ghost_testing December 7, 2017 12:05 Inactive
@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Dec 10, 2017

@nikhilmat what I'm seeing so far is a significant increase in memory consumption. From around 9MB-10MB, 16MB at worst, on the master branch, I get as high as 128MB or even 1G on this branch.
I'm suggesting there's a memory leak.

On the larger migrations I see memory increase linearly in time. On smaller migrations it may not rise linearly.

See this graph for before/after deployment (black vertical line marks deployment time) of this branch:

screen shot 2017-12-10 at 15 11 22

@shlomi-noach
Copy link
Contributor Author

I've reverted the deployment to verify master branch does not suffer from same memory leak. Will observe for the next few days.

@shlomi-noach
Copy link
Contributor Author

Definitely a memory leak. Consider that I've reverted to master on the black line indicator:

screen shot 2017-12-11 at 07 54 46

@shlomi-noach
Copy link
Contributor Author

I suspect the reason is GetDB() ; by overriding caching of sql.DB object the execution creates I-don't-know-how-many (millions? more?) connection instances to the database.

@nikhilmat I'd like you to consider how to re-cache the DB object.

@nikhilmat
Copy link
Contributor

@shlomi-noach thanks for pointing this out! I'll get this fixed up.

I had originally implemented the cache on a per-migration basis by adding it to the migration context, but I understand that's not the best place for it given the future plans to serialize that object. I'm happy to implement a per-migration connection cache and test out that it fixes this memory leak - do you have any suggestions about where would be a good place?

If we don't want to put the cache on the migration context itself, my instincts point me towards accepting a migration identifier as a part of the GetDB() function which will be used to lookup an object from a global cache based on the connection string + the migration identifier.

@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Dec 12, 2017

@nikhilmat is a possible solution. Another idea: how about we cache until Migrator.TearDown() is called? Would that apply well for tests?

@nikhilmat
Copy link
Contributor

I had tried that approach originally in the first PR, but ran into an issue with concurrent migrations when:

  • Migration 1 starts and creates entries in the cache for Database A
  • Migration 2 starts and uses the cached entries created by the first migration for Database A
  • Migration 1 completes, clears the cache, and closes the connections it used.
  • Migration 2 fails because the connections it was using have been closed.

For this reason, I believe we'll need a migration specific database connection cache to prevent these kinds of failures. I'll give the first approach a shot and push it up to get some feedback!

@shlomi-noach shlomi-noach temporarily deployed to production/mysql_role=ghost_testing January 4, 2018 12:49 Inactive
@shlomi-noach
Copy link
Contributor Author

sent to testing in production. I'll report back in a few days.

@nikhilmat
Copy link
Contributor

thank you @shlomi-noach! happy new year 😄

@shlomi-noach shlomi-noach temporarily deployed to production/mysql_role=ghost_testing January 7, 2018 08:11 Inactive
@shlomi-noach shlomi-noach temporarily deployed to production/mysql_role=ghost_testing January 7, 2018 08:12 Inactive
@shlomi-noach shlomi-noach temporarily deployed to production January 7, 2018 08:13 Inactive
@shlomi-noach shlomi-noach temporarily deployed to production January 9, 2018 06:46 Inactive
@shlomi-noach shlomi-noach temporarily deployed to production January 9, 2018 10:14 Inactive
@nikhilmat
Copy link
Contributor

@shlomi-noach really sorry about this, i realized there was a commit missing from my branch that actually calls throttler.Teardown as a part of the Migrator teardown. I've pushed it up to the nikhilmat/nm-teardown-contrib branch now. This set of changes should not be harmful without it, but a portion of the code is left unused without that commit.

Sorry again about missing that with the last push, was working with a few branches locally and got mixed up. I hope that doesn't throw too much of a wrench in the testing.

@shlomi-noach
Copy link
Contributor Author

@nikhilmat sorry, which commit is that? Please link.

@nikhilmat
Copy link
Contributor

nikhilmat commented Jan 10, 2018

@shlomi-noach Gusto@68e83e6

Currently the HEAD of nikhilmat/nm-teardown-contrib

@shlomi-noach
Copy link
Contributor Author

Thank you! Applied, sent to testing.

@shlomi-noach shlomi-noach temporarily deployed to production/mysql_role=ghost_testing January 11, 2018 06:09 Inactive
@shlomi-noach
Copy link
Contributor Author

OK, this looks good in testing! Merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants