diff --git a/CHANGELOG.md b/CHANGELOG.md index 0a58655f16d..69e07686e75 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,11 +5,13 @@ ## 1.4.0-rc.0 / 2020-09-15 * [CHANGE] Cassandra backend support is now GA (stable). #3180 -* [CHANGE] Blocks storage is now GA (stable). The `-experimental` prefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180 +* [CHANGE] Blocks storage is now GA (stable). The `-experimental` prefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180 #3201 - `-experimental.blocks-storage.*` flags renamed to `-blocks-storage.*` - `-experimental.store-gateway.*` flags renamed to `-store-gateway.*` - `-experimental.querier.store-gateway-client.*` flags renamed to `-querier.store-gateway-client.*` - `-experimental.querier.store-gateway-addresses` flag renamed to `-querier.store-gateway-addresses` + - `-store-gateway.replication-factor` flag renamed to `-store-gateway.sharding-ring.replication-factor` + - `-store-gateway.tokens-file-path` flag renamed to `store-gateway.sharding-ring.tokens-file-path` * [CHANGE] Ingester: Removed deprecated untyped record from chunks WAL. Only if you are running `v1.0` or below, it is recommended to first upgrade to `v1.1`/`v1.2`/`v1.3` and run it for a day before upgrading to `v1.4` to avoid data loss. #3115 * [CHANGE] Distributor API endpoints are no longer served unless target is set to `distributor` or `all`. #3112 * [CHANGE] Increase the default Cassandra client replication factor to 3. #3007 diff --git a/docs/blocks-storage/querier.md b/docs/blocks-storage/querier.md index 559ff64a4a3..6a03b97a3a4 100644 --- a/docs/blocks-storage/querier.md +++ b/docs/blocks-storage/querier.md @@ -30,7 +30,7 @@ When a querier receives a query range request, it contains the following paramet Given a query, the querier analyzes the `start` and `end` time range to compute a list of all known blocks containing at least 1 sample within this time range. Given the list of blocks, the querier then computes a list of store-gateway instances holding these blocks and sends a request to each matching store-gateway instance asking to fetch all the samples for the series matching the `query` within the `start` and `end` time range. -The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed). +The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.sharding-ring.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed). If the query time range covers a period within `-querier.query-ingesters-within` duration, the querier also sends the request to all ingesters, in order to fetch samples that have not been uploaded to the long-term storage yet. diff --git a/docs/blocks-storage/querier.template b/docs/blocks-storage/querier.template index 79fdb3d3f1a..97d59a25957 100644 --- a/docs/blocks-storage/querier.template +++ b/docs/blocks-storage/querier.template @@ -30,7 +30,7 @@ When a querier receives a query range request, it contains the following paramet Given a query, the querier analyzes the `start` and `end` time range to compute a list of all known blocks containing at least 1 sample within this time range. Given the list of blocks, the querier then computes a list of store-gateway instances holding these blocks and sends a request to each matching store-gateway instance asking to fetch all the samples for the series matching the `query` within the `start` and `end` time range. -The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed). +The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.sharding-ring.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed). If the query time range covers a period within `-querier.query-ingesters-within` duration, the querier also sends the request to all ingesters, in order to fetch samples that have not been uploaded to the long-term storage yet. diff --git a/docs/blocks-storage/store-gateway.md b/docs/blocks-storage/store-gateway.md index 8877a772a78..c145c01776e 100644 --- a/docs/blocks-storage/store-gateway.md +++ b/docs/blocks-storage/store-gateway.md @@ -31,7 +31,7 @@ Store-gateways continuously monitor the ring state and whenever the ring topolog For each block belonging to a store-gateway shard, the store-gateway loads its `meta.json`, the `deletion-mark.json` and the index-header. Once a block is loaded on the store-gateway, it's ready to be queried by queriers. When the querier queries blocks through a store-gateway, the response will contain the list of actually queried block IDs. If a querier tries to query a block which has not been loaded by a store-gateway, the querier will either retry on a different store-gateway (if blocks replication is enabled) or fail the query. -Blocks can be replicated across multiple store-gateway instances based on a replication factor configured via `-store-gateway.replication-factor`. The blocks replication is used to protect from query failures caused by some blocks not loaded by any store-gateway instance at a given time like, for example, in the event of a store-gateway failure or while restarting a store-gateway instance (e.g. during a rolling update). +Blocks can be replicated across multiple store-gateway instances based on a replication factor configured via `-store-gateway.sharding-ring.replication-factor`. The blocks replication is used to protect from query failures caused by some blocks not loaded by any store-gateway instance at a given time like, for example, in the event of a store-gateway failure or while restarting a store-gateway instance (e.g. during a rolling update). This feature can be enabled via `-store-gateway.sharding-enabled=true` and requires the backend [hash ring](../architecture.md#the-hash-ring) to be configured via `-store-gateway.sharding-ring.*` flags (or their respective YAML config options). @@ -199,12 +199,12 @@ store_gateway: # The replication factor to use when sharding blocks. This option needs be # set both on the store-gateway and querier when running in microservices # mode. - # CLI flag: -store-gateway.replication-factor + # CLI flag: -store-gateway.sharding-ring.replication-factor [replication_factor: | default = 3] # File path where tokens are stored. If empty, tokens are not stored at # shutdown and restored at startup. - # CLI flag: -store-gateway.tokens-file-path + # CLI flag: -store-gateway.sharding-ring.tokens-file-path [tokens_file_path: | default = ""] # The sharding strategy to use. Supported values are: default, diff --git a/docs/blocks-storage/store-gateway.template b/docs/blocks-storage/store-gateway.template index 6f49293c579..8c95335df7a 100644 --- a/docs/blocks-storage/store-gateway.template +++ b/docs/blocks-storage/store-gateway.template @@ -31,7 +31,7 @@ Store-gateways continuously monitor the ring state and whenever the ring topolog For each block belonging to a store-gateway shard, the store-gateway loads its `meta.json`, the `deletion-mark.json` and the index-header. Once a block is loaded on the store-gateway, it's ready to be queried by queriers. When the querier queries blocks through a store-gateway, the response will contain the list of actually queried block IDs. If a querier tries to query a block which has not been loaded by a store-gateway, the querier will either retry on a different store-gateway (if blocks replication is enabled) or fail the query. -Blocks can be replicated across multiple store-gateway instances based on a replication factor configured via `-store-gateway.replication-factor`. The blocks replication is used to protect from query failures caused by some blocks not loaded by any store-gateway instance at a given time like, for example, in the event of a store-gateway failure or while restarting a store-gateway instance (e.g. during a rolling update). +Blocks can be replicated across multiple store-gateway instances based on a replication factor configured via `-store-gateway.sharding-ring.replication-factor`. The blocks replication is used to protect from query failures caused by some blocks not loaded by any store-gateway instance at a given time like, for example, in the event of a store-gateway failure or while restarting a store-gateway instance (e.g. during a rolling update). This feature can be enabled via `-store-gateway.sharding-enabled=true` and requires the backend [hash ring](../architecture.md#the-hash-ring) to be configured via `-store-gateway.sharding-ring.*` flags (or their respective YAML config options). diff --git a/docs/configuration/config-file-reference.md b/docs/configuration/config-file-reference.md index b357043aea4..2c2558cfbec 100644 --- a/docs/configuration/config-file-reference.md +++ b/docs/configuration/config-file-reference.md @@ -3622,12 +3622,12 @@ sharding_ring: # The replication factor to use when sharding blocks. This option needs be set # both on the store-gateway and querier when running in microservices mode. - # CLI flag: -store-gateway.replication-factor + # CLI flag: -store-gateway.sharding-ring.replication-factor [replication_factor: | default = 3] # File path where tokens are stored. If empty, tokens are not stored at # shutdown and restored at startup. - # CLI flag: -store-gateway.tokens-file-path + # CLI flag: -store-gateway.sharding-ring.tokens-file-path [tokens_file_path: | default = ""] # The sharding strategy to use. Supported values are: default, shuffle-sharding. diff --git a/integration/backward_compatibility_test.go b/integration/backward_compatibility_test.go index 63d0a5ba18c..bc142c9e9aa 100644 --- a/integration/backward_compatibility_test.go +++ b/integration/backward_compatibility_test.go @@ -27,15 +27,7 @@ var ( // 0.7.0 used 204 status code for all components "quay.io/cortexproject/cortex:v0.7.0": preCortex10Flags, - "quay.io/cortexproject/cortex:v1.0.0": func(flags map[string]string) map[string]string { - return e2e.MergeFlagsWithoutRemovingEmpty(flags, map[string]string{ - "-store-gateway.sharding-enabled": "", - "-store-gateway.sharding-ring.store": "", - "-store-gateway.sharding-ring.consul.hostname": "", - "-store-gateway.replication-factor": "", - }) - }, - + "quay.io/cortexproject/cortex:v1.0.0": preCortex14Flags, "quay.io/cortexproject/cortex:v1.1.0": preCortex14Flags, "quay.io/cortexproject/cortex:v1.2.0": preCortex14Flags, "quay.io/cortexproject/cortex:v1.3.0": preCortex14Flags, @@ -44,24 +36,24 @@ var ( func preCortex10Flags(flags map[string]string) map[string]string { return e2e.MergeFlagsWithoutRemovingEmpty(flags, map[string]string{ - "-schema-config-file": "", - "-config-yaml": flags["-schema-config-file"], - "-table-manager.poll-interval": "", - "-dynamodb.poll-interval": flags["-table-manager.poll-interval"], - "-store-gateway.sharding-enabled": "", - "-store-gateway.sharding-ring.store": "", - "-store-gateway.sharding-ring.consul.hostname": "", - "-store-gateway.replication-factor": "", + "-schema-config-file": "", + "-config-yaml": flags["-schema-config-file"], + "-table-manager.poll-interval": "", + "-dynamodb.poll-interval": flags["-table-manager.poll-interval"], + "-store-gateway.sharding-enabled": "", + "-store-gateway.sharding-ring.store": "", + "-store-gateway.sharding-ring.consul.hostname": "", + "-store-gateway.sharding-ring.replication-factor": "", }) } func preCortex14Flags(flags map[string]string) map[string]string { return e2e.MergeFlagsWithoutRemovingEmpty(flags, map[string]string{ // Blocks storage CLI flags removed the "experimental" prefix in 1.4. - "-store-gateway.sharding-enabled": "", - "-store-gateway.sharding-ring.store": "", - "-store-gateway.sharding-ring.consul.hostname": "", - "-store-gateway.replication-factor": "", + "-store-gateway.sharding-enabled": "", + "-store-gateway.sharding-ring.store": "", + "-store-gateway.sharding-ring.consul.hostname": "", + "-store-gateway.sharding-ring.replication-factor": "", }) } diff --git a/integration/e2ecortex/services.go b/integration/e2ecortex/services.go index 434bf26de0d..f0304195325 100644 --- a/integration/e2ecortex/services.go +++ b/integration/e2ecortex/services.go @@ -83,10 +83,10 @@ func NewQuerierWithConfigFile(name, consulAddress, configFile string, flags map[ "-querier.frontend-client.backoff-retries": "1", "-querier.worker-parallelism": "1", // Store-gateway ring backend. - "-store-gateway.sharding-enabled": "true", - "-store-gateway.sharding-ring.store": "consul", - "-store-gateway.sharding-ring.consul.hostname": consulAddress, - "-store-gateway.replication-factor": "1", + "-store-gateway.sharding-enabled": "true", + "-store-gateway.sharding-ring.store": "consul", + "-store-gateway.sharding-ring.consul.hostname": consulAddress, + "-store-gateway.sharding-ring.replication-factor": "1", }, flags))...), e2e.NewHTTPReadinessProbe(httpPort, "/ready", 200, 299), httpPort, @@ -114,10 +114,10 @@ func NewStoreGatewayWithConfigFile(name, consulAddress, configFile string, flags "-target": "store-gateway", "-log.level": "warn", // Store-gateway ring backend. - "-store-gateway.sharding-enabled": "true", - "-store-gateway.sharding-ring.store": "consul", - "-store-gateway.sharding-ring.consul.hostname": consulAddress, - "-store-gateway.replication-factor": "1", + "-store-gateway.sharding-enabled": "true", + "-store-gateway.sharding-ring.store": "consul", + "-store-gateway.sharding-ring.consul.hostname": consulAddress, + "-store-gateway.sharding-ring.replication-factor": "1", }, flags))...), e2e.NewHTTPReadinessProbe(httpPort, "/ready", 200, 299), httpPort, diff --git a/integration/querier_test.go b/integration/querier_test.go index ebc80b273ca..6093135aaa1 100644 --- a/integration/querier_test.go +++ b/integration/querier_test.go @@ -295,10 +295,10 @@ func TestQuerierWithBlocksStorageRunningInSingleBinaryMode(t *testing.T) { // Distributor. "-distributor.replication-factor": strconv.FormatInt(seriesReplicationFactor, 10), // Store-gateway. - "-store-gateway.sharding-enabled": strconv.FormatBool(testCfg.blocksShardingEnabled), - "-store-gateway.sharding-ring.store": "consul", - "-store-gateway.sharding-ring.consul.hostname": consul.NetworkHTTPEndpoint(), - "-store-gateway.replication-factor": "1", + "-store-gateway.sharding-enabled": strconv.FormatBool(testCfg.blocksShardingEnabled), + "-store-gateway.sharding-ring.store": "consul", + "-store-gateway.sharding-ring.consul.hostname": consul.NetworkHTTPEndpoint(), + "-store-gateway.sharding-ring.replication-factor": "1", }) // Start Cortex replicas. diff --git a/pkg/storegateway/gateway_ring.go b/pkg/storegateway/gateway_ring.go index fdb15b69d1b..3c890f138f7 100644 --- a/pkg/storegateway/gateway_ring.go +++ b/pkg/storegateway/gateway_ring.go @@ -61,19 +61,21 @@ func (cfg *RingConfig) RegisterFlags(f *flag.FlagSet) { os.Exit(1) } + ringFlagsPrefix := "store-gateway.sharding-ring." + // Ring flags - cfg.KVStore.RegisterFlagsWithPrefix("store-gateway.sharding-ring.", "collectors/", f) - f.DurationVar(&cfg.HeartbeatPeriod, "store-gateway.sharding-ring.heartbeat-period", 15*time.Second, "Period at which to heartbeat to the ring.") - f.DurationVar(&cfg.HeartbeatTimeout, "store-gateway.sharding-ring.heartbeat-timeout", time.Minute, "The heartbeat timeout after which store gateways are considered unhealthy within the ring."+sharedOptionWithQuerier) - f.IntVar(&cfg.ReplicationFactor, "store-gateway.replication-factor", 3, "The replication factor to use when sharding blocks."+sharedOptionWithQuerier) - f.StringVar(&cfg.TokensFilePath, "store-gateway.tokens-file-path", "", "File path where tokens are stored. If empty, tokens are not stored at shutdown and restored at startup.") + cfg.KVStore.RegisterFlagsWithPrefix(ringFlagsPrefix, "collectors/", f) + f.DurationVar(&cfg.HeartbeatPeriod, ringFlagsPrefix+"heartbeat-period", 15*time.Second, "Period at which to heartbeat to the ring.") + f.DurationVar(&cfg.HeartbeatTimeout, ringFlagsPrefix+"heartbeat-timeout", time.Minute, "The heartbeat timeout after which store gateways are considered unhealthy within the ring."+sharedOptionWithQuerier) + f.IntVar(&cfg.ReplicationFactor, ringFlagsPrefix+"replication-factor", 3, "The replication factor to use when sharding blocks."+sharedOptionWithQuerier) + f.StringVar(&cfg.TokensFilePath, ringFlagsPrefix+"tokens-file-path", "", "File path where tokens are stored. If empty, tokens are not stored at shutdown and restored at startup.") // Instance flags cfg.InstanceInterfaceNames = []string{"eth0", "en0"} - f.Var((*flagext.StringSlice)(&cfg.InstanceInterfaceNames), "store-gateway.sharding-ring.instance-interface", "Name of network interface to read address from.") - f.StringVar(&cfg.InstanceAddr, "store-gateway.sharding-ring.instance-addr", "", "IP address to advertise in the ring.") - f.IntVar(&cfg.InstancePort, "store-gateway.sharding-ring.instance-port", 0, "Port to advertise in the ring (defaults to server.grpc-listen-port).") - f.StringVar(&cfg.InstanceID, "store-gateway.sharding-ring.instance-id", hostname, "Instance ID to register in the ring.") + f.Var((*flagext.StringSlice)(&cfg.InstanceInterfaceNames), ringFlagsPrefix+"instance-interface", "Name of network interface to read address from.") + f.StringVar(&cfg.InstanceAddr, ringFlagsPrefix+"instance-addr", "", "IP address to advertise in the ring.") + f.IntVar(&cfg.InstancePort, ringFlagsPrefix+"instance-port", 0, "Port to advertise in the ring (defaults to server.grpc-listen-port).") + f.StringVar(&cfg.InstanceID, ringFlagsPrefix+"instance-id", hostname, "Instance ID to register in the ring.") // Defaults for internal settings. cfg.RingCheckPeriod = 5 * time.Second