Skip to content

Commit 029b6ff

Browse files
pracuccipstibrany
andauthored
Added flag to explicitly enable zone-awareness replication and added store-gateway support (#3200)
* Added flag to explicitly enable zone-awareness replication and added store-gateway support Signed-off-by: Marco Pracucci <[email protected]> * Update docs/blocks-storage/store-gateway.template Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Update docs/guides/zone-replication.md Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]> * Addressed review comments Signed-off-by: Marco Pracucci <[email protected]> * Improved error message when there are not enough healthy instances for the replication set Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Peter Štibraný <[email protected]>
1 parent 067412e commit 029b6ff

File tree

14 files changed

+319
-351
lines changed

14 files changed

+319
-351
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,13 @@
77
* `-experimental.distributor.user-subring-size` flag renamed to `-distributor.ingestion-tenant-shard-size`
88
* `user_subring_size` limit YAML config option renamed to `ingestion_tenant_shard_size`
99
* [CHANGE] Dropped "blank Alertmanager configuration; using fallback" message from Info to Debug level. #3205
10+
* [CHANGE] Zone-awareness replication for time-series now should be explicitly enabled in the distributor via the `-distributor.zone-awareness-enabled` CLI flag (or its respective YAML config option). Before, zone-aware replication was implicitly enabled if a zone was set on ingesters. #3200
1011
* [FEATURE] Added support for shuffle-sharding queriers in the query-frontend. When configured (`-frontend.max-queriers-per-user` globally, or using per-user limit `max_queriers_per_user`), each user's requests will be handled by different set of queriers. #3113
1112
* [ENHANCEMENT] Added `cortex_query_frontend_connected_clients` metric to show the number of workers currently connected to the frontend. #3207
1213
* [ENHANCEMENT] Shuffle sharding: improved shuffle sharding in the write path. Shuffle sharding now should be explicitly enabled via `-distributor.sharding-strategy` CLI flag (or its respective YAML config option) and guarantees stability, consistency, shuffling and balanced zone-awareness properties. #3090
1314
* [ENHANCEMENT] Ingester: added new metric `cortex_ingester_active_series` to track active series more accurately. Also added options to control whether active series tracking is enabled (`-ingester.active-series-enabled`, defaults to false), and how often this metric is updated (`-ingester.active-series-update-period`) and max idle time for series to be considered inactive (`-ingester.active-series-idle-timeout`). #3153
1415
* [ENHANCEMENT] Blocksconvert – Builder: download plan file locally before processing it. #3209
16+
* [ENHANCEMENT] Store-gateway: added zone-aware replication support to blocks replication in the store-gateway. #3200
1517
* [BUGFIX] No-longer-needed ingester operations for queries triggered by queriers and rulers are now canceled. #3178
1618
* [BUGFIX] Ruler: directories in the configured `rules-path` will be removed on startup and shutdown in order to ensure they don't persist between runs. #3195
1719
* [BUGFIX] Handle hash-collisions in the query path. #3192

docs/blocks-storage/store-gateway.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,16 @@ To protect from this, when an healthy store-gateway instance finds another insta
5656

5757
This feature is called **auto-forget** and is built into the store-gateway.
5858

59+
### Zone-awareness
60+
61+
The store-gateway replication optionally supports [zone-awareness](../guides/zone-replication.md). When zone-aware replication is enabled and the blocks replication factor is > 1, each block is guaranteed to be replicated across store-gateway instances running in different availability zones.
62+
63+
**To enable** the zone-aware replication for the store-gateways you should:
64+
65+
1. Configure the availability zone for each store-gateway via the `-store-gateway.sharding-ring.instance-availability-zone` CLI flag (or its respective YAML config option)
66+
2. Enable blocks zone-aware replication via the `-store-gateway.sharding-ring.zone-awareness-enabled` CLI flag (or its respective YAML config option). Please be aware this configuration option should be set to store-gateways, queriers and rulers.
67+
3. Rollout store-gateways, queriers and rulers to apply the new configuration
68+
5969
## Caching
6070

6171
The store-gateway supports the following caches:
@@ -207,6 +217,16 @@ store_gateway:
207217
# CLI flag: -store-gateway.sharding-ring.tokens-file-path
208218
[tokens_file_path: <string> | default = ""]
209219

220+
# True to enable zone-awareness and replicate blocks across different
221+
# availability zones.
222+
# CLI flag: -store-gateway.sharding-ring.zone-awareness-enabled
223+
[zone_awareness_enabled: <boolean> | default = false]
224+
225+
# The availability zone where this instance is running. Required if
226+
# zone-awareness is enabled.
227+
# CLI flag: -store-gateway.sharding-ring.instance-availability-zone
228+
[instance_availability_zone: <string> | default = ""]
229+
210230
# The sharding strategy to use. Supported values are: default,
211231
# shuffle-sharding.
212232
# CLI flag: -store-gateway.sharding-strategy

docs/blocks-storage/store-gateway.template

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,16 @@ To protect from this, when an healthy store-gateway instance finds another insta
5656

5757
This feature is called **auto-forget** and is built into the store-gateway.
5858

59+
### Zone-awareness
60+
61+
The store-gateway replication optionally supports [zone-awareness](../guides/zone-replication.md). When zone-aware replication is enabled and the blocks replication factor is > 1, each block is guaranteed to be replicated across store-gateway instances running in different availability zones.
62+
63+
**To enable** the zone-aware replication for the store-gateways you should:
64+
65+
1. Configure the availability zone for each store-gateway via the `-store-gateway.sharding-ring.instance-availability-zone` CLI flag (or its respective YAML config option)
66+
2. Enable blocks zone-aware replication via the `-store-gateway.sharding-ring.zone-awareness-enabled` CLI flag (or its respective YAML config option). Please be aware this configuration option should be set to store-gateways, queriers and rulers.
67+
3. Rollout store-gateways, queriers and rulers to apply the new configuration
68+
5969
## Caching
6070

6171
The store-gateway supports the following caches:

docs/configuration/config-file-reference.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -523,6 +523,11 @@ lifecycler:
523523
# CLI flag: -distributor.replication-factor
524524
[replication_factor: <int> | default = 3]
525525
526+
# True to enable the zone-awareness and replicate ingested samples across
527+
# different availability zones.
528+
# CLI flag: -distributor.zone-awareness-enabled
529+
[zone_awareness_enabled: <boolean> | default = false]
530+
526531
# Number of tokens for each ingester.
527532
# CLI flag: -ingester.num-tokens
528533
[num_tokens: <int> | default = 128]
@@ -559,8 +564,7 @@ lifecycler:
559564
# CLI flag: -ingester.tokens-file-path
560565
[tokens_file_path: <string> | default = ""]
561566
562-
# The availability zone of the host, this instance is running on. Default is
563-
# an empty string, which disables zone awareness for writes.
567+
# The availability zone where this instance is running.
564568
# CLI flag: -ingester.availability-zone
565569
[availability_zone: <string> | default = ""]
566570
@@ -3663,6 +3667,16 @@ sharding_ring:
36633667
# CLI flag: -store-gateway.sharding-ring.tokens-file-path
36643668
[tokens_file_path: <string> | default = ""]
36653669
3670+
# True to enable zone-awareness and replicate blocks across different
3671+
# availability zones.
3672+
# CLI flag: -store-gateway.sharding-ring.zone-awareness-enabled
3673+
[zone_awareness_enabled: <boolean> | default = false]
3674+
3675+
# The availability zone where this instance is running. Required if
3676+
# zone-awareness is enabled.
3677+
# CLI flag: -store-gateway.sharding-ring.instance-availability-zone
3678+
[instance_availability_zone: <string> | default = ""]
3679+
36663680
# The sharding strategy to use. Supported values are: default, shuffle-sharding.
36673681
# CLI flag: -store-gateway.sharding-strategy
36683682
[sharding_strategy: <string> | default = "default"]

docs/guides/zone-replication.md

Lines changed: 27 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,26 +5,39 @@ weight: 5
55
slug: zone-aware-replication
66
---
77

8-
In a default configuration, time-series written to ingesters are replicated based on the container/pod name of the ingester instances. It is completely possible that all the replicas for the given time-series are held with in the same availability zone, even if the cortex infrastructure spans multiple zones within the region. Storing multiple replicas for a given time-series poses a risk for data loss if there is an outage affecting various nodes within a zone or a total outage.
8+
Cortex supports data replication for different services. By default, data is transparently replicated across the whole pool of service instances, regardless of whether these instances are all running within the same availability zone (or data center, or rack) or in different ones.
99

10-
## Configuration
10+
It is completely possible that all the replicas for the given data are held within the same availability zone, even if the Cortex cluster spans multiple zones. Storing multiple replicas for a given data within the same availability zone poses a risk for data loss if there is an outage affecting various nodes within a zone or a full zone outage.
1111

12-
Cortex can be configured to consider an availability zone value in its replication system. Doing so mitigates risks associated with losing multiple nodes within the same availability zone. The availability zone for an ingester can be defined on the command line of the ingester using the `ingester.availability-zone` flag or using the yaml configuration:
12+
For this reason, Cortex optionally supports zone-aware replication. When zone-aware replication is **enabled**, replicas for the given data are guaranteed to span across different availability zones. This requires Cortex cluster to run at least in a number of zones equal to the configured replication factor.
1313

14-
```yaml
15-
ingester:
16-
lifecycler:
17-
availability_zone: "zone-3"
18-
```
14+
The Cortex services supporting **zone-aware replication** are:
1915

20-
## Zone Replication Considerations
16+
- **[Distributors and Ingesters](#distributors-and-ingesters-time-series-replication)**
17+
- **[Store-gateways](#store-gateways-blocks-replication)** ([blocks storage](../blocks-storage/_index.md) only)
2118

22-
Enabling availability zone awareness helps mitigate risks regarding data loss within a single zone, some items need consideration by an operator if they are thinking of enabling this feature.
19+
## Distributors / Ingesters: time-series replication
2320

24-
### Minimum number of Zones
21+
The Cortex time-series replication is used to hold multiple (typically 3) replicas of each time series in the **ingesters**.
2522

26-
For cortex to function correctly, there must be at least the same number of availability zones as there is replica count. So by default, a cortex cluster should be spread over 3 zones as the default replica count is 3. It is safe to have more zones than the replica count, but it cannot be less. Having fewer availability zones than replica count causes a replica write to be missed, and in some cases, the write fails if the availability zone count is too low.
23+
**To enable** the zone-aware replication for the ingesters you should:
2724

28-
### Cost
25+
1. Configure the availability zone for each ingester via the `-ingester.availability-zone` CLI flag (or its respective YAML config option)
26+
2. Rollout ingesters to apply the configured zone
27+
3. Enable time-series zone-aware replication via the `-distributor.zone-awareness-enabled` CLI flag (or its respective YAML config option). Please be aware this configuration option should be set to distributors, queriers and rulers.
2928

30-
Depending on the existing cortex infrastructure being used, this may cause an increase in running costs as most cloud providers charge for cross availability zone traffic. The most significant change would be for a cortex cluster currently running in a singular zone.
29+
## Store-gateways: blocks replication
30+
31+
The Cortex [store-gateway](../blocks-storage/store-gateway.md) (used only when Cortex is running with the [blocks storage](../blocks-storage/_index.md)) supports blocks sharding, used to horizontally scale blocks in a large cluster without hitting any vertical scalability limit.
32+
33+
To enable the zone-aware replication for the store-gateways, please refer to the [store-gateway](../blocks-storage/store-gateway.md#zone-awareness) documentation.
34+
35+
## Minimum number of zones
36+
37+
For Cortex to function correctly, there must be at least the same number of availability zones as the replication factor. For example, if the replication factor is configured to 3 (default for time-series replication), the Cortex cluster should be spread at least over 3 availability zones.
38+
39+
It is safe to have more zones than the replication factor, but it cannot be less. Having fewer availability zones than replication factor causes a replica write to be missed, and in some cases, the write fails if the availability zones count is too low.
40+
41+
## Impact on costs
42+
43+
Depending on the underlying infrastructure being used, deploying Cortex across multiple availability zones may cause an increase in running costs as most cloud providers charge for inter availability zone networking. The most significant change would be for a Cortex cluster currently running in a single zone.

pkg/distributor/distributor_test.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -963,7 +963,7 @@ func prepare(t *testing.T, cfg prepConfig) ([]*Distributor, []mockIngester, *rin
963963
addr := fmt.Sprintf("%d", i)
964964
ingesterDescs[addr] = ring.IngesterDesc{
965965
Addr: addr,
966-
Zone: addr,
966+
Zone: "",
967967
State: ring.ACTIVE,
968968
Timestamp: time.Now().Unix(),
969969
Tokens: []uint32{uint32((math.MaxUint32 / cfg.numIngesters) * i)},

pkg/ring/lifecycler.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ func (cfg *LifecyclerConfig) RegisterFlagsWithPrefix(prefix string, f *flag.Flag
101101
f.StringVar(&cfg.Addr, prefix+"lifecycler.addr", "", "IP address to advertise in consul.")
102102
f.IntVar(&cfg.Port, prefix+"lifecycler.port", 0, "port to advertise in consul (defaults to server.grpc-listen-port).")
103103
f.StringVar(&cfg.ID, prefix+"lifecycler.ID", hostname, "ID to register into consul.")
104-
f.StringVar(&cfg.Zone, prefix+"availability-zone", "", "The availability zone of the host, this instance is running on. Default is an empty string, which disables zone awareness for writes.")
104+
f.StringVar(&cfg.Zone, prefix+"availability-zone", "", "The availability zone where this instance is running.")
105105
}
106106

107107
// Lifecycler is responsible for managing the lifecycle of entries in the ring.

pkg/ring/replication_strategy.go

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ type ReplicationStrategy interface {
99
// Filter out unhealthy instances and checks if there're enough instances
1010
// for an operation to succeed. Returns an error if there are not enough
1111
// instances.
12-
Filter(instances []IngesterDesc, op Operation, replicationFactor int, heartbeatTimeout time.Duration) (healthy []IngesterDesc, maxFailures int, err error)
12+
Filter(instances []IngesterDesc, op Operation, replicationFactor int, heartbeatTimeout time.Duration, zoneAwarenessEnabled bool) (healthy []IngesterDesc, maxFailures int, err error)
1313

1414
// ShouldExtendReplicaSet returns true if given an instance that's going to be
1515
// added to the replica set, the replica set size should be extended by 1
@@ -25,7 +25,7 @@ type DefaultReplicationStrategy struct{}
2525
// - Filters out dead ingesters so the one doesn't even try to write to them.
2626
// - Checks there is enough ingesters for an operation to succeed.
2727
// The ingesters argument may be overwritten.
28-
func (s *DefaultReplicationStrategy) Filter(ingesters []IngesterDesc, op Operation, replicationFactor int, heartbeatTimeout time.Duration) ([]IngesterDesc, int, error) {
28+
func (s *DefaultReplicationStrategy) Filter(ingesters []IngesterDesc, op Operation, replicationFactor int, heartbeatTimeout time.Duration, zoneAwarenessEnabled bool) ([]IngesterDesc, int, error) {
2929
// We need a response from a quorum of ingesters, which is n/2 + 1. In the
3030
// case of a node joining/leaving, the actual replica set might be bigger
3131
// than the replication factor, so use the bigger or the two.
@@ -49,8 +49,14 @@ func (s *DefaultReplicationStrategy) Filter(ingesters []IngesterDesc, op Operati
4949
// This is just a shortcut - if there are not minSuccess available ingesters,
5050
// after filtering out dead ones, don't even bother trying.
5151
if len(ingesters) < minSuccess {
52-
err := fmt.Errorf("at least %d live replicas required, could only find %d",
53-
minSuccess, len(ingesters))
52+
var err error
53+
54+
if zoneAwarenessEnabled {
55+
err = fmt.Errorf("at least %d live replicas required across different availability zones, could only find %d", minSuccess, len(ingesters))
56+
} else {
57+
err = fmt.Errorf("at least %d live replicas required, could only find %d", minSuccess, len(ingesters))
58+
}
59+
5460
return nil, 0, err
5561
}
5662

pkg/ring/replication_strategy_test.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ func TestRingReplicationStrategy(t *testing.T) {
9191

9292
t.Run(fmt.Sprintf("[%d]", i), func(t *testing.T) {
9393
strategy := &DefaultReplicationStrategy{}
94-
liveIngesters, maxFailure, err := strategy.Filter(ingesters, tc.op, tc.RF, 100*time.Second)
94+
liveIngesters, maxFailure, err := strategy.Filter(ingesters, tc.op, tc.RF, 100*time.Second, false)
9595
if tc.ExpectedError == "" {
9696
assert.NoError(t, err)
9797
assert.Equal(t, tc.LiveIngesters, len(liveIngesters))

0 commit comments

Comments
 (0)