Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: collect and serve pre-aggregated bytes and counts #13020

Merged
merged 47 commits into from
Jun 19, 2024

Conversation

trevorwhitney
Copy link
Collaborator

@trevorwhitney trevorwhitney commented May 22, 2024

What this PR does / why we need it:

This PR pre-aggregates bytes and count per-stream in the pattern ingester to enable the /explore/query_range endpoint to be able to serve bytes_over_time and count_over_time queries. The goal of this work is to speed up histogram queries, and eventually figure out a way to add pre-aggregated metrics to Loki streams.

The PR introduces a StepEvaluator for these aggregated metrics so that all grouping and range selection operations should work as expected.

Future work will be to make this new /explore/query_range endpoint fallback on the normal /query_range endpoint when it can't service a query, thus allowing the Explore App to use this new endpoint exactly like it is the current one.

TODOs:

  • check for filters in queries and reject them
  • put new behavior behind a feature flag
  • compare results of new endpoint with query range in dev-005

@trevorwhitney trevorwhitney changed the title feat: collect and serve pre-aggregated bytes and count feat: collect and serve pre-aggregated bytes and counts May 22, 2024
* pre-aggregate bytes and count per stream in the pattern ingester
* serve bytes_over_time and count_over_time queries from the patterns
  endpoint
@trevorwhitney trevorwhitney marked this pull request as ready for review May 23, 2024 23:04
@trevorwhitney trevorwhitney requested a review from a team as a code owner May 23, 2024 23:04
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, but I think there's too much coupling API wise. We should probably have another GRPC and HTTP API.

And this should be transparent for the users.

@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label May 31, 2024
@trevorwhitney
Copy link
Collaborator Author

trevorwhitney commented Jun 3, 2024

I have tested this in loki-dev-005, results between v1/query_range and /v1/explore/query_range are identical for the same query, time period, and step. Performance isn't much better using the new endpoint, but I think that's largely because the size of the dataset (being much smaller in this case) and because we're not down sampling yet.

there was a bug, more test results to follow

Copy link
Contributor

@jeschkies jeschkies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only started. As I'm new what is the QuerySampleRequest doing?

@@ -178,6 +193,7 @@ func (r *batchRangeVectorIterator) load(start, end int64) {
series.Metric = metric
r.window[lbs] = series
}
// TODO(twhitney): Everywhere else, an FPoint.T is in milliseconds, but here it's in nanoseconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a TODO that's addressable? I tried to refactor our protos to use the duration type some time ago. However, it was very invasive in the whole code base.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not something I'm going to touch in this already large PR, but yes, I think we can address this one usage. My understanding is that it doesn't affect the result because this is inside a step evaluator, and the step evaluator's .Next() returns the timestamp we use in the actual response, which is correct. I just want to fix this for consistency, but, in a separate PR.

pkg/pattern/metric/evaluator.go Show resolved Hide resolved
Comment on lines +98 to +106
if s.cfg.LogPushObservations {
level.Debug(s.logger).
Log("msg", "observing pushed log entries",
"stream", s.labelsString,
"bytes", bytes,
"count", count,
"sample_ts_ns", s.lastTs,
)
}
Copy link
Contributor

@cyriltovena cyriltovena Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need to log this one ?

@trevorwhitney
Copy link
Collaborator Author

trevorwhitney commented Jun 18, 2024

Thanks to @cyriltovena's pruning fix it's working!

Here's a comparison, tested on loki-dev-010 using a 15m query for count_over_time({service_name="unknown_service"}[10s]) with a 10s step. Result values are the same. Looks like there's a slight difference with the timestamps (some truncating in the existing code path?), but we can address that later pushed a commit to fix. Going to keep running it to get more data.

new.json -- using the new /explore/query_range endpoint
old.json -- using the old /query_range endpoint

@trevorwhitney
Copy link
Collaborator Author

This is already a lot to merge, so I'd like to get this in without changing too much else. Everything's behind a feature flag so we can decide when to turn it on. Next steps (for future PRS) will include:

  • more efficient storage format for metrics in memory
  • writing metrics to object storage

@trevorwhitney
Copy link
Collaborator Author

trevorwhitney commented Jun 18, 2024

A more extensive test (1hr with 1m range and 1m step). There are some differences, but it looks like mainly the new endpoint missing datapoints which is to be expected. For the datapoints it has the results look to be the same.

bytes_by_pod_new.json
bytes_by_pod_old.json
bytes_new.json
bytes_old.json
count_new.json
count_old.json

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to unblock but I see two risk.

  1. At scale this is going to cost memory.
  2. How do we make query reliable for rollout on rf1 ? Pattern are experimental feature but not histogram volumes ? How can we detect missing data ?

@cyriltovena cyriltovena merged commit 467eb1b into main Jun 19, 2024
61 checks passed
@cyriltovena cyriltovena deleted the sample-count-and-bytes branch June 19, 2024 09:56
trevorwhitney added a commit that referenced this pull request Jul 31, 2024
trevorwhitney added a commit that referenced this pull request Aug 12, 2024
commit b9e647f
Author: Trevor Whitney <[email protected]>
Date:   Mon Aug 12 11:31:31 2024 -0600

    Squashed commit of the following:

    commit c6ab6b3
    Author: Trevor Whitney <[email protected]>
    Date:   Mon Aug 12 11:09:43 2024 -0600

        chore: remove initial metric aggregation experiment (#13729)

    commit 3c0e3e2
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Mon Aug 12 12:00:51 2024 -0400

        fix(deps): update module github.com/baidubce/bce-sdk-go to v0.9.186 (#13864)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

    commit 6f79194
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Mon Aug 12 10:20:34 2024 -0400

        fix(deps): update module github.com/aliyun/aliyun-oss-go-sdk to v2.2.10+incompatible (#13861)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

    commit ad60738
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Mon Aug 12 09:53:11 2024 -0400

        chore(deps): update grafana/loki-build-image docker tag to v0.33.6 (#13859)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

    commit 292f911
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Mon Aug 12 09:46:36 2024 -0400

        chore(deps): update helm/chart-testing-action action to v2.6.1 (#13855)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

    commit e0bde12
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Mon Aug 12 09:45:49 2024 -0400

        chore(deps): update grafana/promtail docker tag to v1.6.1 (#13851)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

    commit a9cd5e5
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Mon Aug 12 09:42:53 2024 -0400

        chore(deps): update grafana/promtail docker tag to v2.9.10 (#13854)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

    commit 0c28cc7
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Mon Aug 12 09:42:21 2024 -0400

        chore(deps): update dependency go to v1.22.6 (#13842)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
        Co-authored-by: Paul Rogers <[email protected]>

    commit c3a5141
    Author: Paul Rogers <[email protected]>
    Date:   Mon Aug 12 09:29:19 2024 -0400

        chore: Update loki build image to go 1.22.6 (#13857)

    commit 717623b
    Author: Vladyslav Diachenko <[email protected]>
    Date:   Mon Aug 12 15:47:29 2024 +0300

        chore:  manual changelog backport (#13852)

        Signed-off-by: Vladyslav Diachenko <[email protected]>

    commit f933a3b
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Mon Aug 12 08:40:39 2024 -0400

        chore(deps): update grafana/loki docker tag to v2.9.10 (#13848)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
        Co-authored-by: Paul Rogers <[email protected]>

    commit ea6395a
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Mon Aug 12 08:33:10 2024 -0400

        chore(deps): update golang docker tag to v1.22.6 (#13847)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

    commit 53c0c48
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Mon Aug 12 08:32:08 2024 -0400

        chore(deps): update grafana/loki-build-image docker tag to v0.33.5 (#13849)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

    commit 9315b3d
    Author: Grot (@grafanabot) <[email protected]>
    Date:   Fri Aug 9 12:38:47 2024 +0200

        chore(ci): Update yaml file `./production/helm/loki/values.yaml` (+1 other) (#13832)

        Signed-off-by: Vladyslav Diachenko <[email protected]>
        Co-authored-by: Vladyslav Diachenko <[email protected]>

    commit 66d7138
    Author: Grot (@grafanabot) <[email protected]>
    Date:   Thu Aug 8 17:31:41 2024 +0200

        chore: [main] chore(release-3.1.x): release 3.1.1 (#13817)

        Co-authored-by: loki-gh-app[bot] <160051081+loki-gh-app[bot]@users.noreply.github.com>

    commit d5718eb
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Thu Aug 8 10:49:41 2024 -0400

        fix(deps): update github.com/grafana/jsonparser digest to ea80629 (#13814)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

    commit f253db5
    Author: Vladyslav Diachenko <[email protected]>
    Date:   Thu Aug 8 17:12:11 2024 +0300

        fix(ci): fixed release-please manifest (#13810)

        Signed-off-by: Vladyslav Diachenko <[email protected]>

    commit a93f38c
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Thu Aug 8 09:48:10 2024 -0400

        fix(deps): update github.com/c2h5oh/datasize digest to aa82cc1 (#13807)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
        Co-authored-by: Paul Rogers <[email protected]>

    commit e5a3994
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Thu Aug 8 09:39:56 2024 -0400

        fix(deps): update github.com/docker/go-plugins-helpers digest to 45e2431 (#13808)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

    commit 67295e0
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Thu Aug 8 09:33:48 2024 -0400

        fix(deps): update github.com/axiomhq/hyperloglog digest to af9851f (#13806)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

    commit c9b0343
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Thu Aug 8 09:32:02 2024 -0400

        chore(deps): update github.com/grafana/regexp digest to a468a5b (#13805)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

    commit 5bb928e
    Author: Paul Rogers <[email protected]>
    Date:   Thu Aug 8 08:52:50 2024 -0400

        chore: Turn off renovate on non-main branches (#13803)

    commit 217f928
    Author: Vladyslav Diachenko <[email protected]>
    Date:   Thu Aug 8 15:31:43 2024 +0300

        fix(ci): add cleanup step into job `dist` (#13801)

        Signed-off-by: Vladyslav Diachenko <[email protected]>

    commit 00e686d
    Author: benclive <[email protected]>
    Date:   Thu Aug 8 10:40:16 2024 +0100

        chore: Add metastore client as dep for rf1 querier & ignore auth for ListBlocks (#13786)

    commit df61482
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Wed Aug 7 18:12:13 2024 -0400

        fix(deps): update module golang.org/x/text to v0.17.0 (main) (#13794)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

    commit f523530
    Author: Periklis Tsirakidis <[email protected]>
    Date:   Wed Aug 7 20:05:42 2024 +0200

        fix(operator): Don't overwrite annotations for LokiStack ingress resources (#13708)

    commit 5ef83a7
    Author: Ned Andreev <[email protected]>
    Date:   Wed Aug 7 19:44:43 2024 +0300

        fix: panic when parsing and extracting JSON key values (#13790)

    commit bb257f5
    Author: Vladyslav Diachenko <[email protected]>
    Date:   Wed Aug 7 17:39:04 2024 +0300

        feat(loki):  add ability to disable AWS S3 dualstack endpoints usage  (#13785)

        Signed-off-by: Vladyslav Diachenko <[email protected]>

    commit 1bf9791
    Author: Vladyslav Diachenko <[email protected]>
    Date:   Wed Aug 7 16:10:34 2024 +0300

        fix(helm): fixed memcached and provisioner templates (#13788)

        Signed-off-by: Vladyslav Diachenko <[email protected]>

    commit 638f59f
    Author: benclive <[email protected]>
    Date:   Wed Aug 7 14:06:28 2024 +0100

        chore: Remove unused stream index from RF1 ingester (#13758)

    commit 7683a79
    Author: Ned Andreev <[email protected]>
    Date:   Tue Aug 6 20:26:10 2024 +0300

        fix: Include whitespaces in extracted tokens (#13738)

        Co-authored-by: Travis Patterson <[email protected]>

    commit da63ca7
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Tue Aug 6 12:41:48 2024 -0400

        chore(deps): update module golang.org/x/net to v0.23.0 [security] (main) (#13763)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
        Co-authored-by: Paul Rogers <[email protected]>

    commit f8bf3bb
    Author: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
    Date:   Tue Aug 6 12:29:47 2024 -0400

        fix(deps): update module github.com/docker/docker to v27.1.1+incompatible [security] (main) (#13762)

        Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

    commit b88583d
    Author: Marco Pracucci <[email protected]>
    Date:   Tue Aug 6 16:04:28 2024 +0200

        feat: upgrade prometheus (#13671)

        Signed-off-by: Marco Pracucci <[email protected]>
        Signed-off-by: Vladyslav Diachenko <[email protected]>
        Co-authored-by: Vladyslav Diachenko <[email protected]>

    commit 3be5a45
    Author: Dylan Guedes <[email protected]>
    Date:   Mon Aug 5 17:59:21 2024 -0300

        fix(break): helm: Fix how we set imagePullSecrets for enterprise-gateway and admin-api. (#13761)

    commit bdfc86b
    Author: Vladyslav Diachenko <[email protected]>
    Date:   Mon Aug 5 13:46:10 2024 +0300

        chore(helm-chart): added SSE config into AWS storage config (#13746)

        Signed-off-by: Vladyslav Diachenko <[email protected]>

    commit 7e224d5
    Author: Sandeep Sukhani <[email protected]>
    Date:   Sun Aug 4 19:28:29 2024 +0530

        fix: try reading chunks which have incorrect offset for blocks (#13720)

    commit 629671e
    Author: J Stickler <[email protected]>
    Date:   Fri Aug 2 16:58:41 2024 -0400

        docs: Update the Visualize topic (#13742)

    commit 917053a
    Author: Cyril Tovena <[email protected]>
    Date:   Fri Aug 2 18:07:17 2024 +0200

        feat: Introduce wal segment read path. (#13695)

        Co-authored-by: Ben Clive <[email protected]>

    commit 7c50b43
    Author: Paul Rogers <[email protected]>
    Date:   Fri Aug 2 08:47:33 2024 -0400

        build: Update loki-build-image to Alpine 3.20.2 (#13744)

    commit 6dd6b65
    Author: jackyin <[email protected]>
    Date:   Fri Aug 2 01:30:11 2024 +0800

        fix: ast left cycular reference result in oom (#13501)

        Co-authored-by: Travis Patterson <[email protected]>

    commit e81345e
    Author: J Stickler <[email protected]>
    Date:   Thu Aug 1 11:29:38 2024 -0400

        docs: fix broken links due to Alloy docs reorg (#13715)

    commit 40e8352
    Author: Jack Baldry <[email protected]>
    Date:   Thu Aug 1 15:41:48 2024 +0100

        docs: rewrite quickstart with Killercoda metadata (#13234)

        Signed-off-by: Jack Baldry <[email protected]>
        Co-authored-by: J Stickler <[email protected]>
        Co-authored-by: Jay Clifford <[email protected]>

commit c65ba1f
Author: Trevor Whitney <[email protected]>
Date:   Mon Aug 12 11:08:37 2024 -0600

    chore: rework pattern ingester queue metrics

commit b28572a
Author: Trevor Whitney <[email protected]>
Date:   Fri Aug 2 13:16:01 2024 -0600

    fix: change metrics around metric appends

commit e430a96
Author: Trevor Whitney <[email protected]>
Date:   Fri Aug 2 08:59:05 2024 -0600

    fix: lint and format

commit 2f5485d
Author: Trevor Whitney <[email protected]>
Date:   Fri Aug 2 08:45:57 2024 -0600

    feat: improve async tee request processing

commit 3dcc50d
Author: Trevor Whitney <[email protected]>
Date:   Thu Aug 1 15:16:13 2024 -0600

    test: add push and downsample test

commit 019df76
Author: Trevor Whitney <[email protected]>
Date:   Thu Aug 1 14:41:41 2024 -0600

    fix: formatting

commit 7d98cb0
Author: Trevor Whitney <[email protected]>
Date:   Thu Aug 1 13:35:55 2024 -0600

    chore: move some functions around

commit 06269b0
Author: Trevor Whitney <[email protected]>
Date:   Thu Aug 1 13:29:57 2024 -0600

    test: mock out ring in tests

commit 07d57c3
Author: Trevor Whitney <[email protected]>
Date:   Thu Aug 1 11:56:34 2024 -0600

    fix: formatting

commit 93e4add
Author: Trevor Whitney <[email protected]>
Date:   Thu Aug 1 11:53:42 2024 -0600

    fix: remove now unused test

commit c03db1f
Author: Trevor Whitney <[email protected]>
Date:   Thu Aug 1 11:49:59 2024 -0600

    fix: remove duplicate service detection for distributor

commit fe447f5
Author: Trevor Whitney <[email protected]>
Date:   Thu Aug 1 11:45:31 2024 -0600

    fix: service name detection on push

commit afb8513
Author: Trevor Whitney <[email protected]>
Date:   Wed Jul 31 21:40:39 2024 -0600

    feat: introduce RingClient abstraction

    * lint and format

commit ce87fd8
Author: Trevor Whitney <[email protected]>
Date:   Wed Jul 31 20:41:38 2024 -0600

    feat: aggregate byte and count metrics

    * aggregate in the pattern ingester and push back into Loki as a stream

commit b49e52b
Author: Trevor Whitney <[email protected]>
Date:   Wed Jul 31 14:36:59 2024 -0600

    chore: fix formatting

commit af22a21
Author: Trevor Whitney <[email protected]>
Date:   Wed Jul 31 14:32:05 2024 -0600

    chore: fix linting

commit d51cecc
Author: Trevor Whitney <[email protected]>
Date:   Wed Jul 31 14:12:31 2024 -0600

    chore: remove the metric aggregation experiment

    * this will make way for a new, simpler approach

commit dad6fb5
Author: Trevor Whitney <[email protected]>
Date:   Wed Jul 31 13:39:07 2024 -0600

    Revert "feat: collect and serve pre-aggregated bytes and counts (#13020)"

    This reverts commit 467eb1b.

commit a6835e1
Author: Trevor Whitney <[email protected]>
Date:   Wed Jul 31 13:18:12 2024 -0600

    Revert "feat: downsample aggregated metrics (#13449)"

    This reverts commit 2c053ee.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants