-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server/balancer: guarantee replicas are safe if possible #473
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we add an option to tell user about this explicitly.
I didn't see any new options, is it removed in 443d700 ?
} | ||
|
||
// selectBestPeer returns the best peer in other stores. | ||
func (r *replicaChecker) selectBestPeer(region *regionInfo, filters ...Filter) (*metapb.Peer, float64) { | ||
filters = append(filters, r.filters...) | ||
// Add some must have filters. | ||
filters = append(filters, newStateFilter(r.opt)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a healthFilter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
healthFilter
will be used after the best store has been found, or the best store can be filtered if it is temporarily unavailable.
if bestStore == nil || compareStoreScore(store, score, bestStore, bestScore) > 0 { | ||
bestStore = store | ||
bestScore = score | ||
} | ||
} | ||
|
||
if bestStore == nil { | ||
if bestStore == nil || filterTarget(bestStore, r.filters) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not check r.filters
along with filters
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r.filters
are used to filter temporarily unavailable stores. We may choose an unsafe store if we filter those temporarily unavailable stores before calculating the score.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
|
||
// If store 7 is not available, we wait. | ||
tc.setStoreDown(7) | ||
c.Assert(sb.Schedule(cluster), IsNil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we balance 1->6 instead if 7 is down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not safe to balance 1->6, we don't need to balance immediately if 7 is down.
// If store 7 is not available, we wait. | ||
tc.setStoreDown(7) | ||
c.Assert(sb.Schedule(cluster), IsNil) | ||
c.Assert(sb.cache.get(1), IsTrue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not look reasonable to mark 1 as 'not able to balance' when the actual reason is 7 is down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Store 1 is not able to balance temporarily when 7 is down, we can skip it for a while.
@overvenus Yes, the option has been removed for simplicity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
if len(v) == 0 { | ||
return "" | ||
} | ||
id += s.getLabelValue(k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use v here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concat the values into an id, so id will be more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I mean is id += v
, no need to call s.getLabelValue(k)
twice.
for _, k := range keys { | ||
v := s.getLabelValue(k) | ||
if len(v) == 0 { | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to consider the latter key which value is not empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should support missing label cases, it's error prone and hard to explain the behavior.
If we want to update the configurations or labels, we can just turn off the balance temporarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we should forbid the missing label in TiKV, and if TiKV reports the label without a value, we should log an error and don't add this label too.
Rest LGTM |
return cluster.putStore(newStoreInfo(store)) | ||
// Check location labels. | ||
for _, k := range c.s.cfg.Replication.LocationLabels { | ||
if v := s.getLabelValue(k); len(v) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should check before update the store info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's ok, it is a clone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We can not do balance safely if we have only 3 data centers, because of tikv/tikv#1468.
So, the current implementation guarantee safety if we have more than 3 data centers, but we will still balance if we have only 3 data cetners.