Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

Speculative queries #956

Merged
merged 9 commits into from
Jul 25, 2018
Merged

Speculative queries #956

merged 9 commits into from
Jul 25, 2018

Conversation

shanson7
Copy link
Collaborator

@shanson7 shanson7 commented Jun 28, 2018

For #954

I made speculation configurable, but one thing will still change even with speculation disabled: Local requests are now via HTTP rather than a special case. This is good and bad (good: happens in parallel with peer requests and tracing just works, bad: a bit of overhead)

TODO:

  • Figure out partitions / shard group problem
  • Some functions (e.g. findSeries) don't use peerQuery; fix that
  • Try to more efficiently pre-allocate memory for initial results
  • Documentation

api/cluster.go Outdated
// metric api.cluster.speculative.requests is how many peer queries resulted in speculation
speculativeAttempts = stats.NewCounter32("api.cluster.speculative.attempts")

// metric api.cluster.speculative.requests is how many peer queries were improved due to speculation
Copy link
Contributor

@replay replay Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the metric name in the comment is not right

api/cluster.go Outdated

// peerQuerySpeculative takes a request and the path to request it on, then fans it out
// across the cluster, except to the local peer. If any peer fails requests to
// other peers are aborted. If 95% of peers have been heard from, and we are missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speculationThreshold is configurable now

}
memberStartPartition := member.GetPartitions()[0]

if _, ok := membersMap[memberStartPartition]; !ok {
Copy link
Contributor

@replay replay Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to determine whether a member is already in the map or not based on its first partition. But I can't see where we are sorting the partitions, if two MTs of a shard are configured to have the same partitions but they are specified in a different order wouldn't this break?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. IMO, it seems like the partitions should be sorted at start up. I could add a sort to the SetPartitions function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that sorting at startup makes the most sense, that way it only needs to be done once

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that in cases where the partition ids are not sorted in the config it is important to first update to this version of MT without activating speculative queries, and only once all MTs are on this version enable speculative queries. Otherwise querying might temporarily be broken until all MTs are updated because the older ones still returned their partition IDs unsorted. But i guess there is nothing we can do about that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we could sort the partitions of the cluster peers when we get them. As it stands right now, with or without speculation enabled, the partitions will need to be sorted. That is, unless I change the code as I mentioned in #956 (comment)

@shanson7 shanson7 force-pushed the speculativeQueries branch from fc5823a to ca4439c Compare June 29, 2018 20:54
@shanson7
Copy link
Collaborator Author

Status update:

I'm using this on our prod setup. We get about 67 render requests/sec. We have 120 shard groups * 2 replicas (240 total peers). This results in about 8000 peer requests/sec.

Speculation kicks in on darn near all requests, as with that many peers, it's highly likely that any is undergoing some GC.

Here's a snapshot with some of the data:
https://snapshot.raintank.io/dashboard/snapshot/Yy5adwlVIpEo7IyoS3QZj7QuikTsCw27

Of note:

  1. Speculation "win" percentage is frequently in the high 90's, indicating that most requests are aided by speculation.
  2. "Additional" requests to peers (requests that wouldn't be made if speculation was disabled) is frequently less than 5% (Our speculation-threshold is set at 94%, so at most 7 additional HTTP requests are made per speculative query, making our worst case here ~5.8%)
  3. "Win %" seems to increase under load (but so does the p90 response time).

You can see in this snapshot when we upped the load:
https://snapshot.raintank.io/dashboard/snapshot/YAifupVWdmQ8nuIDa58qo4ciEmIWAgwk

Here's a comparison of our render response times before and after rollout (under very light load, 4 render reqs/sec):
https://snapshot.raintank.io/dashboard/snapshot/wBWTIbLGkF2GXiQIEjAjyzRFsG5HMqmB

You can see how much smoother and improved the median and p90 response times are.

@shanson7 shanson7 changed the title WIP - Speculative queries Speculative queries Jul 13, 2018
@replay
Copy link
Contributor

replay commented Jul 20, 2018

looks great, but could you fix the test please? this might be what you want: afc60d2

@shanson7
Copy link
Collaborator Author

I haven't verified it, but I'm pretty sure that the existing peer query mechanism required peers to have the same partitions as their comrades. Speculative peer queries absolutely requires it. Is it worth supporting something like

if speculation-threshold < 1 {
   peerQuerySpeculatively
} else {
   peerQueryOldWay
}

@replay
Copy link
Contributor

replay commented Jul 24, 2018

So far I've not heard of any cases where people configured their MT partitions in a way where the partitions are not the same for all instances in a shard (that's what you mean, right?). I think it's probably not worth the additional complexity just to support this very rare edge case. What do you think @woodsaj

@woodsaj
Copy link
Member

woodsaj commented Jul 24, 2018

the docs state that when you have multiple replicas, partitions must be assigned in groups.
https://github.com/grafana/metrictank/blob/master/docs/clustering.md#combining-metrictanks-horizontal-scaling-plus-high-availability

Copy link
Contributor

@replay replay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@replay replay merged commit a7f011d into grafana:master Jul 25, 2018
@shanson7 shanson7 deleted the speculativeQueries branch July 30, 2018 19:36
@Dieterbe
Copy link
Contributor

Dieterbe commented Aug 6, 2018

Here's a comparison of our render response times before and after rollout (under very light load, 4 render reqs/sec):
https://snapshot.raintank.io/dashboard/snapshot/wBWTIbLGkF2GXiQIEjAjyzRFsG5HMqmB
You can see how much smoother and improved the median and p90 response times are.

interestingly:

  • win % is way lower than before. probably due to the very light load?
  • i see that there were 2 periods wherein speculation was enabled. in the first period, the latencies were still elevated. is this because speculation wasn't fully rolled out across the cluster? (additional HTTP is lower in first period compared to second)

otherwise, all these numbers look great :)

@shanson7
Copy link
Collaborator Author

shanson7 commented Aug 6, 2018

win % is way lower than before. probably due to the very light load?

Correct. Here's a snapshot under heavier load (with real user queries)
https://snapshot.raintank.io/dashboard/snapshot/IoV6Z91WYgh3VFmlB133fr2xVxnH9B4m

i see that there were 2 periods wherein speculation was enabled. in the first period, the latencies were still elevated. is this because speculation wasn't fully rolled out across the cluster? (additional HTTP is lower in first period compared to second)

Yes, this was still during the rollout.

@shanson7
Copy link
Collaborator Author

shanson7 commented Aug 6, 2018

Of note, this optimization is likely of more value the larger the cluster. At our 120 shard group cluster, it's been great.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants