Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server/balancer: guarantee replicas are safe if possible #473

Merged
merged 6 commits into from
Jan 21, 2017

Conversation

huachaohuang
Copy link
Contributor

@huachaohuang huachaohuang commented Jan 13, 2017

We can not do balance safely if we have only 3 data centers, because of tikv/tikv#1468.
So, the current implementation guarantee safety if we have more than 3 data centers, but we will still balance if we have only 3 data cetners.

Copy link
Member

@overvenus overvenus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we add an option to tell user about this explicitly.

I didn't see any new options, is it removed in 443d700 ?

}

// selectBestPeer returns the best peer in other stores.
func (r *replicaChecker) selectBestPeer(region *regionInfo, filters ...Filter) (*metapb.Peer, float64) {
filters = append(filters, r.filters...)
// Add some must have filters.
filters = append(filters, newStateFilter(r.opt))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a healthFilter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

healthFilter will be used after the best store has been found, or the best store can be filtered if it is temporarily unavailable.

if bestStore == nil || compareStoreScore(store, score, bestStore, bestScore) > 0 {
bestStore = store
bestScore = score
}
}

if bestStore == nil {
if bestStore == nil || filterTarget(bestStore, r.filters) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not check r.filters along with filters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r.filters are used to filter temporarily unavailable stores. We may choose an unsafe store if we filter those temporarily unavailable stores before calculating the score.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.


// If store 7 is not available, we wait.
tc.setStoreDown(7)
c.Assert(sb.Schedule(cluster), IsNil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we balance 1->6 instead if 7 is down?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not safe to balance 1->6, we don't need to balance immediately if 7 is down.

// If store 7 is not available, we wait.
tc.setStoreDown(7)
c.Assert(sb.Schedule(cluster), IsNil)
c.Assert(sb.cache.get(1), IsTrue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not look reasonable to mark 1 as 'not able to balance' when the actual reason is 7 is down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Store 1 is not able to balance temporarily when 7 is down, we can skip it for a while.

@huachaohuang
Copy link
Contributor Author

@overvenus Yes, the option has been removed for simplicity.

Copy link
Member

@overvenus overvenus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@disksing
Copy link
Contributor

LGTM

if len(v) == 0 {
return ""
}
id += s.getLabelValue(k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use v here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concat the values into an id, so id will be more appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I mean is id += v, no need to call s.getLabelValue(k) twice.

for _, k := range keys {
v := s.getLabelValue(k)
if len(v) == 0 {
return ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to consider the latter key which value is not empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should support missing label cases, it's error prone and hard to explain the behavior.
If we want to update the configurations or labels, we can just turn off the balance temporarily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should forbid the missing label in TiKV, and if TiKV reports the label without a value, we should log an error and don't add this label too.

@siddontang
Copy link
Contributor

Rest LGTM

return cluster.putStore(newStoreInfo(store))
// Check location labels.
for _, k := range c.s.cfg.Replication.LocationLabels {
if v := s.getLabelValue(k); len(v) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should check before update the store info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ok, it is a clone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@huachaohuang huachaohuang merged commit aca3a4a into master Jan 21, 2017
@huachaohuang huachaohuang deleted the huachao/location branch January 21, 2017 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants