Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel migration #8857

Closed
wants to merge 4 commits into from
Closed

parallel migration #8857

wants to merge 4 commits into from

Conversation

gites
Copy link
Contributor

@gites gites commented Apr 27, 2020

This PR speed up keys migration using the vault operator migrate command.
Each key is migrated in a goroutine (the actual part that moves the data) and thus the blocking operation of copying data is handled separately from the part that iterates over objects in Vault. In addition, the number of parallel goroutines migrating the data can be controlled by setting the max_parallel option in the migration config.

Thanks to this change I manage to cut down Vault storage migration time between Consul and GCS from ~3h 30m to ~5m for around 90k of entries and max_parallel=100.

If that looks promising I will update the docs too.

@hoerup
Copy link

hoerup commented Aug 31, 2022

@hsimon-hashicorp what is the prospect of getting this one merged ?

@heatherezell
Copy link
Contributor

@hsimon-hashicorp what is the prospect of getting this one merged ?

I'll see what I can do! I've asked for a review from our engineer on rotation - please feel free to ping me again if need be, as well. :)

Copy link
Contributor

@HridoyRoy HridoyRoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for submitting this PR!

I've added some review comments. I think a one abstract/general question I have is around testing. There's a duplicate PR here #8077 for this feature, and it would be great if we could port some of the testing logic to the feature branch, or alternatively add some custom testing that would verify the parallelism itself is working as intended.
Thanks again!

@@ -44,6 +45,7 @@ type migratorConfig struct {
StorageSource *server.Storage `hcl:"-"`
StorageDestination *server.Storage `hcl:"-"`
ClusterAddr string `hcl:"cluster_addr"`
MaxParallel string `hcl:"max_parallel"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_parallel might cause a bug with the start flag. Could we please make max_parallel a flag instead of a config option and error if both these flags are specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in favor of a command flag.

@@ -55,11 +56,11 @@ func TestMigration(t *testing.T) {
if err != nil {
t.Fatal(err)
}

maxParallel := "10"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow how this tests the parallelism feature. Could you please elaborate on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to say after 2,5 years ;). Don't look like this makes any sens here. I set the parallel value to 10 only for DFS test and rest is set to 1.

defer permitPool.Release()
err := cb(ctx, key)
if err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this error doesn't make it to the outside dfsScan function. Something should be done with the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you suggest? Sending via channel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work, or using an ErrorGroup should work I think. I can't think of any particular standardization at the moment that I'm particularly partial toward, just that there should be some error handling here. I can circle back with some other folks to see if there's a particular preference or standardization we have as well.

@HridoyRoy HridoyRoy requested a review from peteski22 September 6, 2022 16:18
@hoerup
Copy link

hoerup commented Sep 13, 2022

@gites are you still active, and would you be willing on answering the questions?

@gites gites force-pushed the parallel-migration branch from 286ead5 to 78452f6 Compare September 13, 2022 19:05
@gites gites requested a review from a team September 13, 2022 19:05
@gites gites requested review from a team and austingebauer as code owners September 13, 2022 19:05
@gites gites requested review from a team, mdeggies and jeanneryan and removed request for a team September 13, 2022 19:05
@gites gites force-pushed the parallel-migration branch from 78452f6 to 35ca46b Compare September 13, 2022 19:31
@vercel
Copy link

vercel bot commented Sep 13, 2022

@gites is attempting to deploy a commit to the HashiCorp Team on Vercel.

A member of the Team first needs to authorize it.

@gites
Copy link
Contributor Author

gites commented Sep 13, 2022

@HridoyRoy @hoerup I updated the pr and answered some questions.

@@ -135,6 +144,11 @@ func (c *OperatorMigrateCommand) Run(args []string) int {
}
c.logger = logging.NewVaultLogger(log.LevelFromString(c.flagLogLevel))

if (c.flagStart != "") && c.flagParallel != 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could we add some sanity check on the c.flagParallel value before this to ensure it's >= 1? You could modify this if statement to check c.flagParallel > 1 after that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added flagParallel sanity check for positive int32. Running a migration with flagParallel = 1 is not really a parallel migration so I changed the if here to check for c.flagParallel >= 2

@@ -144,8 +144,13 @@ func (c *OperatorMigrateCommand) Run(args []string) int {
}
c.logger = logging.NewVaultLogger(log.LevelFromString(c.flagLogLevel))

if (c.flagStart != "") && c.flagParallel != 1 {
c.UI.Error("Error: flag -start and -parallel can't be used together")
if c.flagParallel < 1 || c.flagParallel > 2147483647 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use math.MaxInt32 instead of a magic number (here and below)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a matter of fact, we'll likely not gain much from running math.maxInt32 goroutines in parallel and it may not even be possible. How do you feel about limiting flagParallel to some lower number. Say, 100?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually be fine as just math.maxInt32. We can leave it up to the user to properly tune, but maybe we should specifically call that out in the docs.

Name: "parallel",
Default: 1,
Target: &c.flagParallel,
Usage: "Use go rutines when migrating. Can speed up the migration " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Usage: "Use go rutines when migrating. Can speed up the migration " +
Usage: "Use go routines when migrating. Can speed up the migration " +

@msheldyakov
Copy link

Hello! I would like to finish this PR(it looks like almost everything is already here, there are little things left). @gites what do you think? and thanks for all this work!

@gites
Copy link
Contributor Author

gites commented Nov 14, 2022

@msheldyakov go ahead and finish it! Good luck!

@hghaf099
Copy link
Contributor

@gites and @msheldyakov There are some comments and questions that we would like to sort out before merging the PR. Would you be interested to respond/address those comments?

@peteski22
Copy link
Contributor

@gites I tried to help you out with some things if you've been too busy to get back to this, appreciate it's been a long lived affair.

gites#1

@maxcoulombe maxcoulombe removed the request for review from a team January 19, 2023 16:00
@heatherezell
Copy link
Contributor

Hi folks, are we still working on this? Otherwise, our engineers may be able to fork it and make some progress and get it merged. Thanks in advance!

@peteski22
Copy link
Contributor

Hi folks, are we still working on this? Otherwise, our engineers may be able to fork it and make some progress and get it merged. Thanks in advance!

Hey @hsimon-hashicorp, it may well be that the original contributor just hasn't the time now to continue this and it would be a shame to see their efforts wasted. I'm happy to push my fork of @gites' work into a HashiCorp branch so we can try and wrap it up.

I'll leave this comment up for a few days in case @gites and other co-authors want to finish it themselves, and failing a response, or a response to suggest we 'go ahead' I think we should do just that. :)

@peteski22 peteski22 mentioned this pull request Jan 24, 2023
@peteski22
Copy link
Contributor

Hi folks, are we still working on this? Otherwise, our engineers may be able to fork it and make some progress and get it merged. Thanks in advance!

Hey @hsimon-hashicorp, it may well be that the original contributor just hasn't the time now to continue this and it would be a shame to see their efforts wasted. I'm happy to push my fork of @gites' work into a HashiCorp branch so we can try and wrap it up.

I'll leave this comment up for a few days in case @gites and other co-authors want to finish it themselves, and failing a response, or a response to suggest we 'go ahead' I think we should do just that. :)

Oh I see that someone else already asked and was granted permission to finish this off (in the comments above) #8857 (comment). Since nothing is happening on the PR I will start trying to move into 'into' HashiCorp.

@gites
Copy link
Contributor Author

gites commented Jan 24, 2023

@hsimon-hashicorp @peteski22 Yes, go ahead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants