Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,13 @@
## 1.4.0-rc.0 in progress

* [CHANGE] Cassandra backend support is now GA (stable). #3180
* [CHANGE] Blocks storage is now GA (stable). The `-experimental` prefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180
* [CHANGE] Blocks storage is now GA (stable). The `-experimental` prefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180 #3201
- `-experimental.blocks-storage.*` flags renamed to `-blocks-storage.*`
- `-experimental.store-gateway.*` flags renamed to `-store-gateway.*`
- `-experimental.querier.store-gateway-client.*` flags renamed to `-querier.store-gateway-client.*`
- `-experimental.querier.store-gateway-addresses` flag renamed to `-querier.store-gateway-addresses`
- `-store-gateway.replication-factor` flag renamed to `-store-gateway.sharding-ring.replication-factor`
- `-store-gateway.tokens-file-path` flag renamed to `store-gateway.sharding-ring.tokens-file-path`
* [CHANGE] Ingester: Removed deprecated untyped record from chunks WAL. Only if you are running `v1.0` or below, it is recommended to first upgrade to `v1.1`/`v1.2`/`v1.3` and run it for a day before upgrading to `v1.4` to avoid data loss. #3115
* [CHANGE] Distributor API endpoints are no longer served unless target is set to `distributor` or `all`. #3112
* [CHANGE] Increase the default Cassandra client replication factor to 3. #3007
Expand Down
2 changes: 1 addition & 1 deletion docs/blocks-storage/querier.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ When a querier receives a query range request, it contains the following paramet

Given a query, the querier analyzes the `start` and `end` time range to compute a list of all known blocks containing at least 1 sample within this time range. Given the list of blocks, the querier then computes a list of store-gateway instances holding these blocks and sends a request to each matching store-gateway instance asking to fetch all the samples for the series matching the `query` within the `start` and `end` time range.

The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).
The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.sharding-ring.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).

If the query time range covers a period within `-querier.query-ingesters-within` duration, the querier also sends the request to all ingesters, in order to fetch samples that have not been uploaded to the long-term storage yet.

Expand Down
2 changes: 1 addition & 1 deletion docs/blocks-storage/querier.template
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ When a querier receives a query range request, it contains the following paramet

Given a query, the querier analyzes the `start` and `end` time range to compute a list of all known blocks containing at least 1 sample within this time range. Given the list of blocks, the querier then computes a list of store-gateway instances holding these blocks and sends a request to each matching store-gateway instance asking to fetch all the samples for the series matching the `query` within the `start` and `end` time range.

The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).
The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.sharding-ring.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).

If the query time range covers a period within `-querier.query-ingesters-within` duration, the querier also sends the request to all ingesters, in order to fetch samples that have not been uploaded to the long-term storage yet.

Expand Down
6 changes: 3 additions & 3 deletions docs/blocks-storage/store-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Store-gateways continuously monitor the ring state and whenever the ring topolog

For each block belonging to a store-gateway shard, the store-gateway loads its `meta.json`, the `deletion-mark.json` and the index-header. Once a block is loaded on the store-gateway, it's ready to be queried by queriers. When the querier queries blocks through a store-gateway, the response will contain the list of actually queried block IDs. If a querier tries to query a block which has not been loaded by a store-gateway, the querier will either retry on a different store-gateway (if blocks replication is enabled) or fail the query.

Blocks can be replicated across multiple store-gateway instances based on a replication factor configured via `-store-gateway.replication-factor`. The blocks replication is used to protect from query failures caused by some blocks not loaded by any store-gateway instance at a given time like, for example, in the event of a store-gateway failure or while restarting a store-gateway instance (e.g. during a rolling update).
Blocks can be replicated across multiple store-gateway instances based on a replication factor configured via `-store-gateway.sharding-ring.replication-factor`. The blocks replication is used to protect from query failures caused by some blocks not loaded by any store-gateway instance at a given time like, for example, in the event of a store-gateway failure or while restarting a store-gateway instance (e.g. during a rolling update).

This feature can be enabled via `-store-gateway.sharding-enabled=true` and requires the backend [hash ring](../architecture.md#the-hash-ring) to be configured via `-store-gateway.sharding-ring.*` flags (or their respective YAML config options).

Expand Down Expand Up @@ -199,12 +199,12 @@ store_gateway:
# The replication factor to use when sharding blocks. This option needs be
# set both on the store-gateway and querier when running in microservices
# mode.
# CLI flag: -store-gateway.replication-factor
# CLI flag: -store-gateway.sharding-ring.replication-factor
[replication_factor: <int> | default = 3]

# File path where tokens are stored. If empty, tokens are not stored at
# shutdown and restored at startup.
# CLI flag: -store-gateway.tokens-file-path
# CLI flag: -store-gateway.sharding-ring.tokens-file-path
[tokens_file_path: <string> | default = ""]

# The sharding strategy to use. Supported values are: default,
Expand Down
2 changes: 1 addition & 1 deletion docs/blocks-storage/store-gateway.template
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Store-gateways continuously monitor the ring state and whenever the ring topolog

For each block belonging to a store-gateway shard, the store-gateway loads its `meta.json`, the `deletion-mark.json` and the index-header. Once a block is loaded on the store-gateway, it's ready to be queried by queriers. When the querier queries blocks through a store-gateway, the response will contain the list of actually queried block IDs. If a querier tries to query a block which has not been loaded by a store-gateway, the querier will either retry on a different store-gateway (if blocks replication is enabled) or fail the query.

Blocks can be replicated across multiple store-gateway instances based on a replication factor configured via `-store-gateway.replication-factor`. The blocks replication is used to protect from query failures caused by some blocks not loaded by any store-gateway instance at a given time like, for example, in the event of a store-gateway failure or while restarting a store-gateway instance (e.g. during a rolling update).
Blocks can be replicated across multiple store-gateway instances based on a replication factor configured via `-store-gateway.sharding-ring.replication-factor`. The blocks replication is used to protect from query failures caused by some blocks not loaded by any store-gateway instance at a given time like, for example, in the event of a store-gateway failure or while restarting a store-gateway instance (e.g. during a rolling update).

This feature can be enabled via `-store-gateway.sharding-enabled=true` and requires the backend [hash ring](../architecture.md#the-hash-ring) to be configured via `-store-gateway.sharding-ring.*` flags (or their respective YAML config options).

Expand Down
4 changes: 2 additions & 2 deletions docs/configuration/config-file-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -3655,12 +3655,12 @@ sharding_ring:

# The replication factor to use when sharding blocks. This option needs be set
# both on the store-gateway and querier when running in microservices mode.
# CLI flag: -store-gateway.replication-factor
# CLI flag: -store-gateway.sharding-ring.replication-factor
[replication_factor: <int> | default = 3]

# File path where tokens are stored. If empty, tokens are not stored at
# shutdown and restored at startup.
# CLI flag: -store-gateway.tokens-file-path
# CLI flag: -store-gateway.sharding-ring.tokens-file-path
[tokens_file_path: <string> | default = ""]

# The sharding strategy to use. Supported values are: default, shuffle-sharding.
Expand Down
34 changes: 13 additions & 21 deletions integration/backward_compatibility_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,7 @@ var (
// 0.7.0 used 204 status code for all components
"quay.io/cortexproject/cortex:v0.7.0": preCortex10Flags,

"quay.io/cortexproject/cortex:v1.0.0": func(flags map[string]string) map[string]string {
return e2e.MergeFlagsWithoutRemovingEmpty(flags, map[string]string{
"-store-gateway.sharding-enabled": "",
"-store-gateway.sharding-ring.store": "",
"-store-gateway.sharding-ring.consul.hostname": "",
"-store-gateway.replication-factor": "",
})
},

"quay.io/cortexproject/cortex:v1.0.0": preCortex14Flags,
"quay.io/cortexproject/cortex:v1.1.0": preCortex14Flags,
"quay.io/cortexproject/cortex:v1.2.0": preCortex14Flags,
"quay.io/cortexproject/cortex:v1.3.0": preCortex14Flags,
Expand All @@ -44,24 +36,24 @@ var (

func preCortex10Flags(flags map[string]string) map[string]string {
return e2e.MergeFlagsWithoutRemovingEmpty(flags, map[string]string{
"-schema-config-file": "",
"-config-yaml": flags["-schema-config-file"],
"-table-manager.poll-interval": "",
"-dynamodb.poll-interval": flags["-table-manager.poll-interval"],
"-store-gateway.sharding-enabled": "",
"-store-gateway.sharding-ring.store": "",
"-store-gateway.sharding-ring.consul.hostname": "",
"-store-gateway.replication-factor": "",
"-schema-config-file": "",
"-config-yaml": flags["-schema-config-file"],
"-table-manager.poll-interval": "",
"-dynamodb.poll-interval": flags["-table-manager.poll-interval"],
"-store-gateway.sharding-enabled": "",
"-store-gateway.sharding-ring.store": "",
"-store-gateway.sharding-ring.consul.hostname": "",
"-store-gateway.sharding-ring.replication-factor": "",
})
}

func preCortex14Flags(flags map[string]string) map[string]string {
return e2e.MergeFlagsWithoutRemovingEmpty(flags, map[string]string{
// Blocks storage CLI flags removed the "experimental" prefix in 1.4.
"-store-gateway.sharding-enabled": "",
"-store-gateway.sharding-ring.store": "",
"-store-gateway.sharding-ring.consul.hostname": "",
"-store-gateway.replication-factor": "",
"-store-gateway.sharding-enabled": "",
"-store-gateway.sharding-ring.store": "",
"-store-gateway.sharding-ring.consul.hostname": "",
"-store-gateway.sharding-ring.replication-factor": "",
})
}

Expand Down
16 changes: 8 additions & 8 deletions integration/e2ecortex/services.go
Original file line number Diff line number Diff line change
Expand Up @@ -83,10 +83,10 @@ func NewQuerierWithConfigFile(name, consulAddress, configFile string, flags map[
"-querier.frontend-client.backoff-retries": "1",
"-querier.worker-parallelism": "1",
// Store-gateway ring backend.
"-store-gateway.sharding-enabled": "true",
"-store-gateway.sharding-ring.store": "consul",
"-store-gateway.sharding-ring.consul.hostname": consulAddress,
"-store-gateway.replication-factor": "1",
"-store-gateway.sharding-enabled": "true",
"-store-gateway.sharding-ring.store": "consul",
"-store-gateway.sharding-ring.consul.hostname": consulAddress,
"-store-gateway.sharding-ring.replication-factor": "1",
}, flags))...),
e2e.NewHTTPReadinessProbe(httpPort, "/ready", 200, 299),
httpPort,
Expand Down Expand Up @@ -114,10 +114,10 @@ func NewStoreGatewayWithConfigFile(name, consulAddress, configFile string, flags
"-target": "store-gateway",
"-log.level": "warn",
// Store-gateway ring backend.
"-store-gateway.sharding-enabled": "true",
"-store-gateway.sharding-ring.store": "consul",
"-store-gateway.sharding-ring.consul.hostname": consulAddress,
"-store-gateway.replication-factor": "1",
"-store-gateway.sharding-enabled": "true",
"-store-gateway.sharding-ring.store": "consul",
"-store-gateway.sharding-ring.consul.hostname": consulAddress,
"-store-gateway.sharding-ring.replication-factor": "1",
}, flags))...),
e2e.NewHTTPReadinessProbe(httpPort, "/ready", 200, 299),
httpPort,
Expand Down
8 changes: 4 additions & 4 deletions integration/querier_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -295,10 +295,10 @@ func TestQuerierWithBlocksStorageRunningInSingleBinaryMode(t *testing.T) {
// Distributor.
"-distributor.replication-factor": strconv.FormatInt(seriesReplicationFactor, 10),
// Store-gateway.
"-store-gateway.sharding-enabled": strconv.FormatBool(testCfg.blocksShardingEnabled),
"-store-gateway.sharding-ring.store": "consul",
"-store-gateway.sharding-ring.consul.hostname": consul.NetworkHTTPEndpoint(),
"-store-gateway.replication-factor": "1",
"-store-gateway.sharding-enabled": strconv.FormatBool(testCfg.blocksShardingEnabled),
"-store-gateway.sharding-ring.store": "consul",
"-store-gateway.sharding-ring.consul.hostname": consul.NetworkHTTPEndpoint(),
"-store-gateway.sharding-ring.replication-factor": "1",
})

// Start Cortex replicas.
Expand Down
20 changes: 11 additions & 9 deletions pkg/storegateway/gateway_ring.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,19 +61,21 @@ func (cfg *RingConfig) RegisterFlags(f *flag.FlagSet) {
os.Exit(1)
}

ringFlagsPrefix := "store-gateway.sharding-ring."

// Ring flags
cfg.KVStore.RegisterFlagsWithPrefix("store-gateway.sharding-ring.", "collectors/", f)
f.DurationVar(&cfg.HeartbeatPeriod, "store-gateway.sharding-ring.heartbeat-period", 15*time.Second, "Period at which to heartbeat to the ring.")
f.DurationVar(&cfg.HeartbeatTimeout, "store-gateway.sharding-ring.heartbeat-timeout", time.Minute, "The heartbeat timeout after which store gateways are considered unhealthy within the ring."+sharedOptionWithQuerier)
f.IntVar(&cfg.ReplicationFactor, "store-gateway.replication-factor", 3, "The replication factor to use when sharding blocks."+sharedOptionWithQuerier)
f.StringVar(&cfg.TokensFilePath, "store-gateway.tokens-file-path", "", "File path where tokens are stored. If empty, tokens are not stored at shutdown and restored at startup.")
cfg.KVStore.RegisterFlagsWithPrefix(ringFlagsPrefix, "collectors/", f)
f.DurationVar(&cfg.HeartbeatPeriod, ringFlagsPrefix+"heartbeat-period", 15*time.Second, "Period at which to heartbeat to the ring.")
f.DurationVar(&cfg.HeartbeatTimeout, ringFlagsPrefix+"heartbeat-timeout", time.Minute, "The heartbeat timeout after which store gateways are considered unhealthy within the ring."+sharedOptionWithQuerier)
f.IntVar(&cfg.ReplicationFactor, ringFlagsPrefix+"replication-factor", 3, "The replication factor to use when sharding blocks."+sharedOptionWithQuerier)
f.StringVar(&cfg.TokensFilePath, ringFlagsPrefix+"tokens-file-path", "", "File path where tokens are stored. If empty, tokens are not stored at shutdown and restored at startup.")

// Instance flags
cfg.InstanceInterfaceNames = []string{"eth0", "en0"}
f.Var((*flagext.StringSlice)(&cfg.InstanceInterfaceNames), "store-gateway.sharding-ring.instance-interface", "Name of network interface to read address from.")
f.StringVar(&cfg.InstanceAddr, "store-gateway.sharding-ring.instance-addr", "", "IP address to advertise in the ring.")
f.IntVar(&cfg.InstancePort, "store-gateway.sharding-ring.instance-port", 0, "Port to advertise in the ring (defaults to server.grpc-listen-port).")
f.StringVar(&cfg.InstanceID, "store-gateway.sharding-ring.instance-id", hostname, "Instance ID to register in the ring.")
f.Var((*flagext.StringSlice)(&cfg.InstanceInterfaceNames), ringFlagsPrefix+"instance-interface", "Name of network interface to read address from.")
f.StringVar(&cfg.InstanceAddr, ringFlagsPrefix+"instance-addr", "", "IP address to advertise in the ring.")
f.IntVar(&cfg.InstancePort, ringFlagsPrefix+"instance-port", 0, "Port to advertise in the ring (defaults to server.grpc-listen-port).")
f.StringVar(&cfg.InstanceID, ringFlagsPrefix+"instance-id", hostname, "Instance ID to register in the ring.")

// Defaults for internal settings.
cfg.RingCheckPeriod = 5 * time.Second
Expand Down