-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maglev: Parallelize calculation of permutations #14597
Conversation
Signed-off-by: Martynas Pumputis <[email protected]>
3a74462
to
cc90307
Compare
go func(from int) { | ||
to := from + batchSize | ||
if to > bCount { | ||
to = bCount | ||
} | ||
for i := from; i < to; i++ { | ||
offset, skip := getOffsetAndSkip(backends[i], m) | ||
perm[i*int(m)] = offset % m | ||
for j := uint64(1); j < m; j++ { | ||
perm[i*int(m)+int(j)] = (perm[i*int(m)+int(j-1)] + skip) % m | ||
} | ||
|
||
} | ||
wg.Done() | ||
}(g) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in an offline discussion, using a semaphore here would constrain the number of goroutines as well. However, there's a mention of wanting to avoid creating a goroutine per backend, which can be expensive given a large number of backends. This would presumably put pressure on the Golang scheduler to schedule a potentially large number of goroutines, even if short lived, as well as taking up a much larger footprint of memory.
From my quick reading of the code, this seems like it's splitting the "work" into chunks and having a goroutine process the chunk. I think this is a preferable approach as it would only create up to X goroutines, rather than len(backends)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimization gives ~5x improvement in time:
Nice! Did you also assess the overhead / time for smaller number of backends and/or M? Mainly wondering if the heuristic should be adapted to only kick in when the overhead of spawning extra go routines brings a measurable win ... otoh I don't think people would configure very small M.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With smaller Ms I got 4x.
test-me-please |
retest-netnext |
retest-net-next |
CI net-next hit #12511. |
retest-net-next |
retest-gke |
Create a number of goroutines to calculate Maglev backend permutatons. We choose the number to be equal to the number of CPUs available, as the calculation doesn't block and is completely CPU-bound. So, adding more goroutines would introduce an overhead instead of an improvement. The optimization gives ~5x improvement in time: BEFORE OPTIMIZATION > go test -v -check.v -check.b -check.btime 5s -check.bmem [..] MaglevTestSuite.BenchmarkGetMaglevTable 5 1352437020 ns/op 1049633299 B/op 9 allocs/op AFTER OPTIMIZATION > go test -v -check.v -check.b -check.btime 5s -check.bmem [..] MaglevTestSuite.BenchmarkGetMaglevTable 50 271517785 ns/op 1049633901 B/op 12 allocs/op Signed-off-by: Chris Tarazi <[email protected]> Signed-off-by: Martynas Pumputis <[email protected]>
All checks have passed. Pushed the comment. |
cc90307
to
36860ae
Compare
Create a number of goroutines to calculate Maglev backend permutatons.
We choose the number to be equal to the number of CPUs available, as the
calculation doesn't block and is completely CPU-bound. So, adding more
goroutines would introduce an overhead instead of an improvement.
The optimization gives ~5x improvement in time:
Signed-off-by: Chris Tarazi [email protected]
Signed-off-by: Martynas Pumputis [email protected]
Related: #14397.