-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel migration #8857
parallel migration #8857
Conversation
f47c0b3
to
6af20d8
Compare
@hsimon-hashicorp what is the prospect of getting this one merged ? |
I'll see what I can do! I've asked for a review from our engineer on rotation - please feel free to ping me again if need be, as well. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for submitting this PR!
I've added some review comments. I think a one abstract/general question I have is around testing. There's a duplicate PR here #8077 for this feature, and it would be great if we could port some of the testing logic to the feature branch, or alternatively add some custom testing that would verify the parallelism itself is working as intended.
Thanks again!
command/operator_migrate.go
Outdated
@@ -44,6 +45,7 @@ type migratorConfig struct { | |||
StorageSource *server.Storage `hcl:"-"` | |||
StorageDestination *server.Storage `hcl:"-"` | |||
ClusterAddr string `hcl:"cluster_addr"` | |||
MaxParallel string `hcl:"max_parallel"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_parallel
might cause a bug with the start
flag. Could we please make max_parallel
a flag instead of a config option and error if both these flags are specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in favor of a command flag.
command/operator_migrate_test.go
Outdated
@@ -55,11 +56,11 @@ func TestMigration(t *testing.T) { | |||
if err != nil { | |||
t.Fatal(err) | |||
} | |||
|
|||
maxParallel := "10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow how this tests the parallelism feature. Could you please elaborate on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to say after 2,5 years ;). Don't look like this makes any sens here. I set the parallel value to 10 only for DFS test and rest is set to 1.
defer permitPool.Release() | ||
err := cb(ctx, key) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this error doesn't make it to the outside dfsScan
function. Something should be done with the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you suggest? Sending via channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could work, or using an ErrorGroup should work I think. I can't think of any particular standardization at the moment that I'm particularly partial toward, just that there should be some error handling here. I can circle back with some other folks to see if there's a particular preference or standardization we have as well.
@gites are you still active, and would you be willing on answering the questions? |
286ead5
to
78452f6
Compare
78452f6
to
35ca46b
Compare
@gites is attempting to deploy a commit to the HashiCorp Team on Vercel. A member of the Team first needs to authorize it. |
@HridoyRoy @hoerup I updated the pr and answered some questions. |
command/operator_migrate.go
Outdated
@@ -135,6 +144,11 @@ func (c *OperatorMigrateCommand) Run(args []string) int { | |||
} | |||
c.logger = logging.NewVaultLogger(log.LevelFromString(c.flagLogLevel)) | |||
|
|||
if (c.flagStart != "") && c.flagParallel != 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please could we add some sanity check on the c.flagParallel
value before this to ensure it's >= 1
? You could modify this if
statement to check c.flagParallel > 1
after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added flagParallel
sanity check for positive int32. Running a migration with flagParallel = 1
is not really a parallel migration so I changed the if
here to check for c.flagParallel >= 2
@@ -144,8 +144,13 @@ func (c *OperatorMigrateCommand) Run(args []string) int { | |||
} | |||
c.logger = logging.NewVaultLogger(log.LevelFromString(c.flagLogLevel)) | |||
|
|||
if (c.flagStart != "") && c.flagParallel != 1 { | |||
c.UI.Error("Error: flag -start and -parallel can't be used together") | |||
if c.flagParallel < 1 || c.flagParallel > 2147483647 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use math.MaxInt32
instead of a magic number (here and below)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a matter of fact, we'll likely not gain much from running math.maxInt32
goroutines in parallel and it may not even be possible. How do you feel about limiting flagParallel
to some lower number. Say, 100?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should actually be fine as just math.maxInt32
. We can leave it up to the user to properly tune, but maybe we should specifically call that out in the docs.
Name: "parallel", | ||
Default: 1, | ||
Target: &c.flagParallel, | ||
Usage: "Use go rutines when migrating. Can speed up the migration " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage: "Use go rutines when migrating. Can speed up the migration " + | |
Usage: "Use go routines when migrating. Can speed up the migration " + |
Hello! I would like to finish this PR(it looks like almost everything is already here, there are little things left). @gites what do you think? and thanks for all this work! |
@msheldyakov go ahead and finish it! Good luck! |
@gites and @msheldyakov There are some comments and questions that we would like to sort out before merging the PR. Would you be interested to respond/address those comments? |
Hi folks, are we still working on this? Otherwise, our engineers may be able to fork it and make some progress and get it merged. Thanks in advance! |
Hey @hsimon-hashicorp, it may well be that the original contributor just hasn't the time now to continue this and it would be a shame to see their efforts wasted. I'm happy to push my fork of @gites' work into a HashiCorp branch so we can try and wrap it up. I'll leave this comment up for a few days in case @gites and other co-authors want to finish it themselves, and failing a response, or a response to suggest we 'go ahead' I think we should do just that. :) |
Oh I see that someone else already asked and was granted permission to finish this off (in the comments above) #8857 (comment). Since nothing is happening on the PR I will start trying to move into 'into' HashiCorp. |
@hsimon-hashicorp @peteski22 Yes, go ahead |
This PR speed up keys migration using the
vault operator migrate
command.Each key is migrated in a goroutine (the actual part that moves the data) and thus the blocking operation of copying data is handled separately from the part that iterates over objects in Vault. In addition, the number of parallel goroutines migrating the data can be controlled by setting the
max_parallel
option in the migration config.Thanks to this change I manage to cut down Vault storage migration time between Consul and GCS from ~3h 30m to ~5m for around 90k of entries and
max_parallel=100
.If that looks promising I will update the docs too.