Skip to content
This repository was archived by the owner on Apr 28, 2025. It is now read-only.

Conversation

@pstibrany
Copy link
Contributor

This PR adds flag to enable streaming of chunks from block-based ingesters.

@pstibrany pstibrany requested a review from a team as a code owner March 17, 2021 09:55
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@pstibrany pstibrany merged commit 5cf0c4f into main Mar 17, 2021
@pstibrany pstibrany deleted the chunk-streaming branch March 17, 2021 10:04
simonswine pushed a commit to grafana/mimir that referenced this pull request Dec 20, 2021
pracucci added a commit to grafana/mimir that referenced this pull request Dec 20, 2021
* Added mega_user class

Signed-off-by: Marco Pracucci <[email protected]>

* Fine-tune blocks storage config

Signed-off-by: Marco Pracucci <[email protected]>

* Disable tests by default to fix README instructions

Ref grafana/cortex-jsonnet#95

* Run store-gateway without CPU limits

Signed-off-by: Marco Pracucci <[email protected]>

* Use v1 API for Deployment and StatefulSet resources

* Version bump to v1.1.0

* Actually include the ruler

* Update config option name

* Added ruler_enabled and alertmanager_enabled flags. (grafana/cortex-jsonnet#116)

* Added publish not ready addresses

Signed-off-by: Joe Elliott <[email protected]>

* Removed -experimental.tsdb.store-gateway-enabled flag

Signed-off-by: Marco Pracucci <[email protected]>

* Added a discovery svc and pointed the querier service at itself

Signed-off-by: Joe Elliott <[email protected]>

* lint

Signed-off-by: Joe Elliott <[email protected]>

* Added PodDisruptionBudget for store-gateway

Signed-off-by: Marco Pracucci <[email protected]>

* Allow to configure the blocks replication factor

Signed-off-by: Marco Pracucci <[email protected]>

* Switch store-gateway StatefulSets to Parallel Pod Management

Signed-off-by: Marco Pracucci <[email protected]>

* Ruler should use metadata cache as well, if configured. (grafana/cortex-jsonnet#128)

Ruler instantiates querier internally, so it can use metadata cache.

* Allow to customize ingester disk size and class

Signed-off-by: Marco Pracucci <[email protected]>

* Version bump to 1.2.0

* refactor: use jaeger-agent-mixin

lib got moved: grafana/jsonnet-libshttps://github.com/grafana/cortex-jsonnet/pull/291

used jb-0.4.0 which updates the jsonnetfile.json format

* Switch blocks storage ingesters to Parallel pod management policy and 4d retention

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed comment

Signed-off-by: Marco Pracucci <[email protected]>

* Chunks blocks migration (grafana/cortex-jsonnet#148)

* Allow configuring querier with second store engine.

* Introduced newIngesterStatefulSet and newIngesterPdb functions.

* Rename parameters to be more clear.

* refactor(cortex): use first class citizens

for:
* requiredDuringSchedulingIgnoredDuringExecutionType
* portsType

These are available from: https://github.com/jsonnet-libs/k8s-alpha

* Update blocks storage CLI flags

Signed-off-by: Marco Pracucci <[email protected]>

* Do not apply blocks storage config to query-frontend, table-manager and purger

Signed-off-by: Marco Pracucci <[email protected]>

* Cleaned up blocks storage config

Signed-off-by: Marco Pracucci <[email protected]>

* Apply chunks-store config if primary or secondary store use chunks. (grafana/cortex-jsonnet#160)

* Enable table manager when using chunks storage as secondary storage engine for querier. (grafana/cortex-jsonnet#161)

* fix(ksonnet): backwards compatibility with ksonnet

* add overrides config to tsdb store-gateway

* Add jsonnet for ingester StatefulSet with WAL (grafana/cortex-jsonnet#72)

* Add jsonnet for ingester StatefulSet with WAL

Signed-off-by: Ganesh Vernekar <[email protected]>

* Add CHANGELOG entry

Signed-off-by: Ganesh Vernekar <[email protected]>

* Fix lint

Signed-off-by: Ganesh Vernekar <[email protected]>

* Fix review comments

Signed-off-by: Ganesh Vernekar <[email protected]>

* Change max query length to 32 days

To allow for comparision over months of 31d

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Fix ruler S3 config option (grafana/cortex-jsonnet#174)

* Removed -experimental.tsdb.store-gateway-enabled flag

Signed-off-by: Marco Pracucci <[email protected]>

* Use correct config variable for s3 ruler config

* restore dropped line

Co-authored-by: Marco Pracucci <[email protected]>

* Add support for local ruler_client_type (grafana/cortex-jsonnet#175)

* Support Alertmanager HA

With this, we can now support increasing the number of replicas for a
Cortex AM thus enabling HA.

 Please note that Alerts themselves are not gossiped between
Alertmanagers. Each Ruler needs to send the alert to every Alertmanager
available thus the reason why a headless service gets created when the
number of replicas is more than 1.

* Setup the gossip port

* s/isGossiping/isHa

* Bump to 3 replicas by default

* Bump the cortex image, the latest stable is 1.3

* Fix typo in Alertmanager configuration

* Alertmanager configuration tweaks

- Introduces the `fallback_config` option to allow an Alertmanager to
  have a fallback config.
- Given the headless service a different name to allow seamless
  switching between 1 or multiple replicas. The cluster field in the
service metadata is immutable which made it impossible to create the new
service unless you delete the previous one.

* Remove different name for a headless service

Sadly, we can't have a different name for the headless service as the
statefulset is configured to match its name.

* Fix ruler s3 storage configuration

* Block storage support for s3

* Added Azure support to blocks storage

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed linter

Signed-off-by: Marco Pracucci <[email protected]>

* Removed the experimental prefix from blocks storage CLI flags

Signed-off-by: Marco Pracucci <[email protected]>

* Lower default ingestion limits and create a new overrides user

* Address review feedback

* Bump default series limit by 50%

* Add flusher job for blocks.

* Fixed Azure account name/key config

Signed-off-by: Marco Pracucci <[email protected]>

* Rename changed flags for 1.4 release.

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Make sure only a single ruler rolls out at a time

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Cut 1.4.0

Signed-off-by: Marco Pracucci <[email protected]>

* Add overrides exporter

Overrides exporter part of grafana/cortex-tools and exposes runtime
overrides and related presets of Cortex as metrics.

Signed-off-by: Christian Simon <[email protected]>

* Refactor limits and overrides

Ensure we expose 'extra_small_user' and reference it setting the
"default" values.

This will raise the limits of the 'small_user' preset to the defaults
for `ingester.max-samples-per-query` and
`ingester.max-series-per-query`.

Signed-off-by: Christian Simon <[email protected]>

* Removed support for ingester.statefulset_replicas

Signed-off-by: Marco Pracucci <[email protected]>

* Switch compactor statefulset to Parallel pod management policy

Signed-off-by: Marco Pracucci <[email protected]>

* Cut 1.5.0 release

Signed-off-by: Marco Pracucci <[email protected]>

* Add ruler limits

Sets default presets for for all the 'users' when it comes to ruler
limits.

* Add for the last user

* Enabled compactor sharding

Signed-off-by: Marco Pracucci <[email protected]>

* Rollback PR 213

Signed-off-by: Marco Pracucci <[email protected]>

* Re-introduce ruler limits

Signed-off-by: Marco Pracucci <[email protected]>

* [fixup] ruler limits config key name

Ruler limits have a prefix of `ruler_` on the config key name. This
makes the key match and then uses them as the value for the flags.

* Removed postings-compression-enabled

Signed-off-by: Marco Pracucci <[email protected]>

* Fine-tuned gRPC keepalive pings settings

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed gRPC settings

Signed-off-by: Marco Pracucci <[email protected]>

* Release 1.6.0

Signed-off-by: Marco Pracucci <[email protected]>

* Add option to configure unregister ingesters on shutdown

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed config

Signed-off-by: Marco Pracucci <[email protected]>

* Improved comment

Signed-off-by: Marco Pracucci <[email protected]>

* Updated doc

Signed-off-by: Marco Pracucci <[email protected]>

* Removed ifs

Signed-off-by: Marco Pracucci <[email protected]>

* Updated comment

Signed-off-by: Marco Pracucci <[email protected]>

* Fixed syntax error

Signed-off-by: Marco Pracucci <[email protected]>

* Remove misleading comment (grafana/cortex-jsonnet#243)

Signed-off-by: Marco Pracucci <[email protected]>

* Add option to customise the configmap name

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Fix for real

Signed-off-by: Marco Pracucci <[email protected]>

* Added bucket index flag, and enable bucket index by default. (grafana/cortex-jsonnet#254)

* Cleanup blocks storage config

Signed-off-by: Marco Pracucci <[email protected]>

* feat: allow for Alertmanager to configure multiple storage backends

Signed-off-by: Jacob Lisi <[email protected]>

* Update cortex/config.libsonnet

Co-authored-by: gotjosh <[email protected]>

* Update cortex/alertmanager.libsonnet

Co-authored-by: gotjosh <[email protected]>

* Release 1.7.0. (grafana/cortex-jsonnet#260)

* Release 1.7.0.

* cortex: config: Fix error message for alertmanager_client_type.

* cortex: alertmanager: Remove space in dot notation.

* Up metadata connection limits

* Add flag to enable streaming of chunks. (grafana/cortex-jsonnet#276)

Signed-off-by: Peter Štibraný <[email protected]>

* Add recording rules to calculate Cortex scaling

- Update dashboard so it only shows under provisioned services and why
- Add sizing rules based on limits.
- Add some docs to the dashboard.

Signed-off-by: Tom Wilkie <[email protected]>

* chore: update lib to use new API paths

Signed-off-by: Jacob Lisi <[email protected]>

* Create 1.8.0 release. (grafana/cortex-jsonnet#282)

* Create 1.8.0 release.

Signed-off-by: Peter Štibraný <[email protected]>

* Update image tags.

Signed-off-by: Peter Štibraný <[email protected]>

* Do not use deprecated Alertmanager cluster flags

Signed-off-by: Marco Pracucci <[email protected]>

* fix: Update ksonnet-util vendor lock

The previous version `c19a92e586a6752f11745b47f309b13f02ef7147` is
incompatible with the library in its current form. For example in
`tsdb.libsonnet` L81, we use `pvc.new('ingester-pvc')` but at the
locked version, in `ksonnet-util/kausal.libsonnet` the `pvc.new`
function takes no arguments.

Signed-off-by: Jack Baldry <[email protected]>

* Add function to customize compactor statefulset

Signed-off-by: Marco Pracucci <[email protected]>

* Add querier_service_ignored_labels (grafana/cortex-jsonnet#291)

Co-authored-by: Victor Tsang Hi <[email protected]>

* Introduce ingester instance limits to configuration, and add alerts. (grafana/cortex-jsonnet#296)

* Introduce ingester instance limits to configuration, and add alerts.

* CHANGELOG.md

* Address (internal) review feedback.

* Add `query-scheduler.libsonnet` (grafana/cortex-jsonnet#295)

* Add query-scheduler.libsonnet.

* CHANGELOG.md

* Use flag to enable query-scheduler.

* Fix image.

* Replace use of querier.compress-http-responses removed in Cortex 1.9

Signed-off-by: Nick Pillitteri <[email protected]>

* Enable index-header lazy loading in store-gateway

Signed-off-by: Marco Pracucci <[email protected]>

* Do not use deprecated/removed flag -limits.per-user-override-config

Signed-off-by: Marco Pracucci <[email protected]>

* Use new ruler storage config and enable API compression

Signed-off-by: Marco Pracucci <[email protected]>

* Changed alertmanager config to use the new storage config

Signed-off-by: Marco Pracucci <[email protected]>

* Cut release 1.9.0

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Mount overrides configmap to alertmanager too

Signed-off-by: Marco Pracucci <[email protected]>

* Upgrade memcached

Signed-off-by: Marco Pracucci <[email protected]>

* Increase default store-gateway memory request and limit

Signed-off-by: Marco Pracucci <[email protected]>

* Fix

Signed-off-by: Marco Pracucci <[email protected]>

* Set -server.grpc-max-*-msg-size-bytes for ruler and ingester. (grafana/cortex-jsonnet#326)

* Fixed --alertmanager.cluster.peers

Signed-off-by: Marco Pracucci <[email protected]>

* Set empty alertmanager listen address with 1 replica

Alertmanager tries to start clustering unless the flag is explicitly set as an empty string
https://github.com/prometheus/alertmanager#turn-off-high-availability

* Add option to disable anti-affinity in newIngesterStatefulSet()

Signed-off-by: Marco Pracucci <[email protected]>

* Fix alertmanager config change introduced in grafana/cortex-jsonnet#344

Signed-off-by: Marco Pracucci <[email protected]>

* Create another tier with 300K active series

The other tiers have a 3x jump except when we go from 100K to 1Mil. I
think we should have a 3x jump for the first tier too.

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Improve config settings based on recent learnings

Signed-off-by: Marco Pracucci <[email protected]>

* Added functions to create query-frontend and querier deployments

Signed-off-by: Marco Pracucci <[email protected]>

* Added function to create query-scheduler deployment

Signed-off-by: Marco Pracucci <[email protected]>

* chore: upgrade to latest etcd-operator

Brings: grafana/jsonnet-libs#480

* Alertmanager: Allow storage configuration to support Azure

The alertmanager configuration did not have support for Azure. Let's add it.

* remove new line

* Fix comment on medium_small_user config

It says it should be 100k + 50%, but that's what extra_small_user is.
Here we have 300k, which is 200k + 50%.

Signed-off-by: Oleg Zaytsev <[email protected]>

* Remove wrong comment

Signed-off-by: Oleg Zaytsev <[email protected]>

* Add overrides to compactor

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Split limits config into a variable we can reuse

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Review feedback

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Fix missing ruler limits

Damn, missed this in grafana/cortex-jsonnet#391

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Alertmanager: Add sharding configuration.

* Fix `compactor_blocks_retention_period` type in `extra_small_user` (grafana/cortex-jsonnet#395)

* Fix `compactor_blocks_retention_period` type in `extra_small_user`

The actual type of `compactor_blocks_retention_period` is `model.Duration`. Which comes
from prometheus `common` package.

The problem is that `model.Duration` have custom JSON unmarshal which treat the incoming
value as string.
https://github.com/prometheus/common/blob/main/model/time.go#L276

So setting it as integer, won't work when unmarshalling with JSON.

NOTE: This won't be an issue for YamlUnmarshal, as it always treating it as string (even
though you put it as integer)
https://github.com/prometheus/common/blob/main/model/time.go#L307

* update CHANGELOG

* Update rule limits to be inline with customer expectations

We built the initial rules on guesswork and now we're updating them
based on what the customers are asking for.

Further, the ruler can be horizontally scaled and we're happy letting
our users have more rules!

Signed-off-by: Goutham Veeramachaneni <[email protected]>

* Remove max_samples_per_query limit. (grafana/cortex-jsonnet#397)

* Remove max_samples_per_query limit.

* Fixed CHANGELOG.md

* Removed chunks storage query sharding config support

Signed-off-by: Marco Pracucci <[email protected]>

* Add queryEngineConfig

Signed-off-by: Marco Pracucci <[email protected]>

* tsdb: Add multi concurrency and max idle connections store gateway params

Signed-off-by: Arve Knudsen <[email protected]>

* Update cortex/tsdb.libsonnet

Co-authored-by: Marco Pracucci <[email protected]>

* Fix formatting

Signed-off-by: Arve Knudsen <[email protected]>

* tsdb: Use literal numbers instead of variables

Signed-off-by: Arve Knudsen <[email protected]>

* cortex: Make ruler object storage support generic

Signed-off-by: Arve Knudsen <[email protected]>

* Remove ruler-storage.gcs.bucket-name for Azure

Signed-off-by: Arve Knudsen <[email protected]>

* cortex: Define Azure ruler args

Signed-off-by: Arve Knudsen <[email protected]>

* Parameterize

Signed-off-by: Arve Knudsen <[email protected]>

* Further document ingester_stream_chunks_when_using_blocks parameter

Signed-off-by: Arve Knudsen <[email protected]>

* Add options to disable anti-affinity

Signed-off-by: Marco Pracucci <[email protected]>

* Upstream some config improvements

Signed-off-by: Marco Pracucci <[email protected]>

* Increased max connections for memcached chunks and index-queries too

Signed-off-by: Marco Pracucci <[email protected]>

* Ruler: Pass `-ruler-storage.s3.endpoint` to ruler when using S3.

This argument is is required, without it, the following error appears:

```
no s3 endpoint in config file
```

* Allow to create custom store-gateway StatefulSets via newStoreGatewayStatefulSet()

Signed-off-by: Marco Pracucci <[email protected]>

* Fix newStoreGatewayStatefulSet() to use input container

Signed-off-by: Marco Pracucci <[email protected]>

* Add CI check for jsonnet manifests

* Remove additional git diff in check-mixin

* Imported cortex-jsonnet CHANGELOG entries from 1.9.0

Signed-off-by: Marco Pracucci <[email protected]>

* Improved CHANGELOG header

Signed-off-by: Marco Pracucci <[email protected]>

Co-authored-by: Marco Pracucci <[email protected]>
Co-authored-by: Austin McKinley <[email protected]>
Co-authored-by: Tom Wilkie <[email protected]>
Co-authored-by: Jacob Lisi <[email protected]>
Co-authored-by: Austin McKinley <[email protected]>
Co-authored-by: Goutham Veeramachaneni <[email protected]>
Co-authored-by: Peter Štibraný <[email protected]>
Co-authored-by: Joe Elliott <[email protected]>
Co-authored-by: Joe Elliott <[email protected]>
Co-authored-by: Duologic <[email protected]>
Co-authored-by: Jeroen Op 't Eynde <[email protected]>
Co-authored-by: Sandeep Sukhani <[email protected]>
Co-authored-by: Ganesh Vernekar <[email protected]>
Co-authored-by: Stan Kwong <[email protected]>
Co-authored-by: gotjosh <[email protected]>
Co-authored-by: forestsword <[email protected]>
Co-authored-by: Jacob Lisi <[email protected]>
Co-authored-by: Alex Martin <[email protected]>
Co-authored-by: Tom Wilkie <[email protected]>
Co-authored-by: Jack Baldry <[email protected]>
Co-authored-by: Victor Tsang Hi <[email protected]>
Co-authored-by: Victor Tsang Hi <[email protected]>
Co-authored-by: Nick Pillitteri <[email protected]>
Co-authored-by: Steve Simpson <[email protected]>
Co-authored-by: Hamish <[email protected]>
Co-authored-by: Javier Palomo <[email protected]>
Co-authored-by: gotjosh <[email protected]>
Co-authored-by: Oleg Zaytsev <[email protected]>
Co-authored-by: Kaviraj <[email protected]>
Co-authored-by: Arve Knudsen <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants