[Usage Counters] Enhancements to the APIs by gsoldevila · Pull Request #187665 · elastic/kibana

gsoldevila · 2024-07-05T12:14:47Z

Summary

Part of #186530
Follow-up of #187064

The goal of this PR is to provide the necessary means to allow implementing the Counting views part of the Dashboards++ initiative.
We do this by extending the capabilities of the usage counters APIs:

We support custom retention periods. Currently data is only kept in SO indices for 5 days. Having 90 days worth of counting was required for Dashboards++.
We expose a Search API that will allow retrieving persisted counters.

elasticmachine · 2024-07-09T14:24:45Z

Pinging @elastic/kibana-core (Team:Core)

pgayvallet · 2024-07-10T13:33:37Z

src/plugins/usage_collection/tsconfig.json

Why do we need this internal Core module import (@kbn/core-saved-objects-api-server-internal)? Is that only to stub getCurrentTime during the integration tests?

Yes I'm afraid so, it bothers me too.
Perhaps I can try to find a cleaner way to insert old counters for testing purposes.

In f90542c I used the standard incrementCounter to create the counters, and then I used esClient.updateByQuery() to modify their updated_at dates. Not ideal, but cleaner than the mock IMO, and addresses your feedback above.

gsoldevila · 2024-07-12T12:29:05Z

src/plugins/usage_collection/server/plugin.ts

@afharo @Bamieh I have extracted all of the CollectorSet methods into a separate interface ICollectorSet, in an effort to make it cleaner / clearer that the usage-collection plugin offers 2 different things in the setup contract.

gsoldevila · 2024-07-16T16:33:26Z

src/plugins/kibana_usage_collection/server/plugin.ts

I moved the rollups logic (aka delete counters older than 5 days) to the usage_collection plugin, IMO it makes more sense to have it there:

It is specific of usage counters, kibana_usage_collection handles plenty of other collectors.

If someone disables kibana_usage_collection, usage counters would be captured and persisted indefinitely.

WDYT?

Makes sense to me! Thanks!

pgayvallet

Implementation looks fine to me, but I'm not the one with the most knowledge of this area, so a second review would probably make sense

pgayvallet · 2024-07-16T14:01:33Z

src/plugins/usage_collection/server/usage_counters/usage_counters_service.ts

NIT: we're not following our service pattern here, this method shouldn't be public / used directly. But it's not really that significant.

++ ATM I am working on some enhancements that include making this private.
UPDATE: Addressed with fe9fb62

afharo

Overall LGTM. I added a few comments that I'd like to discuss before approving.

afharo · 2024-07-19T13:53:17Z

src/plugins/kibana_usage_collection/server/plugin.ts

Makes sense to me! Thanks!

afharo · 2024-07-19T14:03:15Z

src/plugins/usage_collection/server/plugin.ts

nit: how about returning an empty response instead? I wonder if throwing this error will create the need for plugins to check if it's enabled (and we don't provide an API to share if it's enabled or not).

Alternatively, we could allow searching, only that, when enabled: false, we don't store more data. WDYT?

The problem is I only obtain the search method when calling the service.start(). Either:

I call start (and we'll have a few RxJS timers running without buffering any events)

Or I make the search method public (we break consistency with other services).

Or I expose the search in the response of the stop() hook.

I chose the 3rd option. Fixed in f58e23b

afharo · 2024-07-19T14:17:03Z

src/plugins/usage_collection/server/usage_counters/rollups/rollups.ts

hmm, I don't think this approach is valid anymore...

If we have 12 dashboards viewed every day during the last 90 days, they'll show up in the first page and we won't remove others...

I wonder if we should change this to group per retention days (domainId: 'dashboard' and updated_at < retentionPeriodDays vs. not domainId: 'dashboard' and updated_at < USAGE_COUNTERS_KEEP_DOCS_FOR_DAYS).

Good catch! we'll have to think about a better strategy.

After discussion, we can do a search by domainId, and filter by updated_at < (now - retentionPeriodDays).
We can then loop through the different domain IDs.
Will assess bulkDelete following @TinaHeiligers's latest comment.

@afharo updated with 357b474

afharo · 2024-07-19T14:18:12Z

src/plugins/usage_collection/server/usage_counters/rollups/rollups.ts

nit: we can do it in a follow up: but we can now use the bulkDelete API :)

We'll see about that, cause I believe the bulkDelete does not allow deleting from multiple namespaces.

It will when you add 'force'. Please check that the option hasn't changed.

afharo · 2024-07-19T14:21:43Z

src/plugins/usage_collection/server/usage_counters/search/search.ts

should we use PIT search?

Totally, added with #187665 (comment)

afharo · 2024-07-19T14:24:21Z

src/plugins/usage_collection/server/usage_counters/usage_counter.ts

nit: I don't see us defaulting it anywhere here.

How about defaulting retentionPeriodDays = USAGE_COUNTERS_KEEP_DOCS_FOR_DAYS if not provided?

It's defaulted in the rollup logic, this way the in-memory UsageCounter's are lighter (no need to store the property if it matches the default value).

afharo · 2024-07-19T14:45:36Z

src/plugins/usage_collection/server/usage_counters/search/search.ts

IMO, aggregations might better achieve what we're after here.

I imagine an API like:

usageCounters.search({ filters: { domainId, counterName, counterType, source, timestamp: { [lt|lte|gt|gte]: Date } }, aggregation_keys: [ 'domainId', 'counterName', 'counterType', 'source', 'timestamp' ] });

That we internally map to an aggregation call and return flattened.

So... for a counter with this structure:

{ "domainId": "dashboard", "counterName": "<dashboard-id>", "counterType": "views", "source": "server" }

If I only care about the grand total of my N-day retention period for all my dashboards, I'd call the API like

usageCounters.search({ filters: { domainId: "dashboard", counterType: "views", source: "server", }, aggregation_keys: [] });

And I'd get a response like

{ counters: [ { domainId: "dashboard", counterType: "views", source: "server", count: 9999999, } ] }

If I want the grand total for each dashboard, I'd call the API like

usageCounters.search({ filters: { domainId: "dashboard", counterType: "views", source: "server", }, aggregation_keys: [ 'counterName' ] });

And I'd get a response like

{ counters: [ { domainId: "dashboard", counterType: "views", source: "server", counterName: "dashboard-1" count: 10, }, { domainId: "dashboard", counterType: "views", source: "server", counterName: "dashboard-2" count: 9999989, } ] }

If I want the histogram for a specific dashboard and time range, I'd call the API like

usageCounters.search({ filters: { domainId: "dashboard", counterType: "views", counterName: "dashboard-2", source: "server", timestamp: { gte: "2024-07-01T00:00:00.000Z", lte: "2024-07-05T00:00:00.000Z", } }, aggregation_keys: [ 'timestamp' ] });

And I'd get a response like

{ counters: [ { domainId: "dashboard", counterType: "views", counterName: "dashboard-2" source: "server", timestamp: "2024-07-01T00:00:00.000Z" count: 10, }, { domainId: "dashboard", counterType: "views", counterName: "dashboard-2" source: "server", timestamp: "2024-07-03T00:00:00.000Z" count: 9999989, } ] }

The benefits of the aggregations are:

No pagination required

Can be requested at any level

WDYT?

It's an interesting proposal!

In the current use case, Anton still has to retrieve individual days counters to show them in the UI, so I went for the simplest approach possible, and let him aggregate counts on his side (saving 1 call in the process).

But I agree that it seems desirable to perform the aggregations on our side long term, makes for a more elegant API. Regarding pagination, when retrieving individual results you might still have plenty, but we currently circumvent this by allowing the from: string parameter. This way we can filter and only get counters that are more recent than a certain date (e.g. now - 90d).

Let's discuss this offline!

Regarding pagination, when retrieving individual results you might still have plenty, but we currently circumvent this by allowing the from: string parameter. This way we can filter and only get counters that are more recent than a certain date (e.g. now - 90d).

AFAIK, the recommended way to paginate is via PIT for various reasons:

The from + size technique is limited to 10_000 entries: https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html (no matter if they are 10 pages of 1000 or 1000 pages of 10)

If updates occur in the process, the list will be resorted, so getting the 2nd+ pages may return previously fetched documents if new documents are indexed during the pagination.

AFAIK, we'll always retrieve all values queried, since the intention is not to show these values in a table that the user can paginate. So I don't think we're saving ourselves from any potential issues.

Anton still has to retrieve individual days counters to show them in the UI

@Dosant, just FMI, will you retrieve all days for the histogram, and add them up to get the total? Or will the histogram be of the last 7 days and the total will account for the entire retention period (30d? 90d?)

Let's discuss this offline!

Sure! Happy to meet when you're back :)

@afharo, I retrieve last 90 days using from. Sum them up to get a "total" to display "Views in last 90 days" and bucket into weeks to display a weekly histogram #187993

After discussion with @afharo, we agreed that:

We can leave aggregations for a later phase. I prepared the ground by encapsulating current filters under filters property.

We must implement PIT search.

These changes have been implemented in d8e86e4

elasticmachine · 2024-07-19T16:58:53Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 0240e5716a7a9547af04d37c8b598c068dcf59db

Failed CI Steps

Jest Integration Tests #5

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`usageCollection`	16	14	-2

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id	before	after	diff
`usageCollection`	2	4	+2

Unknown metric groups

API count

id	before	after	diff
`usageCollection`	56	50	-6

History

💚 Build #222027 succeeded 54c6a1252d2fc537072540eacedd2634a64a1b47
💚 Build #221977 succeeded b18505344cc7c115388daa9a5bb2dff9253c081b
💔 Build #221842 failed 27676f964e053139b5abb1ecca44df3eb525b3d4
💚 Build #221380 succeeded f9c4b4fcdee5c35a085e7a9c3e11c2cda4aae1ec
💔 Build #221330 failed f4e8e60e6bb69907d8b88cbdcdb428ea4fc759c4
💔 Build #221124 failed bd5f093367e32b29e1acc80b3c98096cb623d6fe

…fix'

afharo

LGTM

afharo · 2024-08-01T12:47:13Z

x-pack/plugins/cloud/server/collectors/cloud_usage_collector.ts


 export function createCloudUsageCollector(
-  usageCollection: UsageCollectionSetup,
+  usageCollection: ICollectorSet,


nit: I think this is still UsageCollectionSetup. It's getting the entire plugin contract.
Prob what triggered is the name ICollectorSet. It sounds a bit confusing for a CollectorSet to be needed to create a usage collector...

WDYT?

I thought about that for a while, and I think the root problem is "naming is hard".

IMO it would make more sense to name it CollectorManager instead of ICollectorSet.
Then, UsageCollectionSetup would have both the CollectorManager & UsageCountersServiceSetup.
NB the code above does not need anything about the UsageCounters.

The problem is that lots of plugins are already using the global UsageCollectionSetup, so changing all the references to the more specific CollectorManager would take some work + codeowners approvals.
I only changed this one cause it falls under our ownership.

I can rollback these changes and we can tackle this on a separate PR.

UPDATE: rolled back with b8abddb

afharo · 2024-08-02T14:07:21Z

src/plugins/usage_collection/server/usage_counters/rollups/rollups.ts

+        if (toDelete.length === ROLLUP_BATCH_SIZE) {
+          // we found a lot of old Usage Counters, put the counter back in the queue, as there might be more
+          counterQueue.push(counter);
+        }


davismcphee

Data Discovery changes LGTM

Dosant

x-pack/plugins/reporting/server/routes/ change due to interface changes lgtm.

~~I haven't rebased my frontend work yet to check if everything is still looking good, I'll try today or tomorrow. but don't want to block, up to you if you'd like to wait~~ Looks good!

kibana-ci · 2024-08-05T14:33:28Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: ea10c24

Failed CI Steps

FTR Configs #19

Test Failures

[job] [logs] FTR Configs #19 / Dataset Quality Dataset quality table filters shows full dataset names when toggled

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`usageCollection`	16	14	-2

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id	before	after	diff
`usageCollection`	2	4	+2

Unknown metric groups

API count

id	before	after	diff
`usageCollection`	56	51	-5

History

💛 Build #225538 was flaky 9b9f007
💔 Build #225497 failed c62399f
💔 Build #225386 failed 357b474
💛 Build #225307 was flaky f58e23b

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

gsoldevila added Team:Core Platform Core services: plugins, logging, config, saved objects, http, ES client, i18n, etc t// Feature:Telemetry release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting v8.16.0 labels Jul 5, 2024

gsoldevila force-pushed the kbn-usage-counters-search-api branch 2 times, most recently from 9596133 to e2eb96d Compare July 8, 2024 11:04

gsoldevila changed the title ~~Draft~~ [Usage Counters] Add API to support searching / retrieving persisted usage-counters Jul 8, 2024

gsoldevila force-pushed the kbn-usage-counters-search-api branch from e2eb96d to 9385678 Compare July 8, 2024 14:56

gsoldevila changed the title ~~[Usage Counters] Add API to support searching / retrieving persisted usage-counters~~ [Usage Counters] Enhancements to the APIs Jul 9, 2024

gsoldevila marked this pull request as ready for review July 9, 2024 14:24

gsoldevila requested a review from a team as a code owner July 9, 2024 14:24

gsoldevila force-pushed the kbn-usage-counters-search-api branch from c99c82a to f0904a4 Compare July 10, 2024 10:34

pgayvallet reviewed Jul 10, 2024

View reviewed changes

gsoldevila force-pushed the kbn-usage-counters-search-api branch 2 times, most recently from a115044 to dfd4335 Compare July 12, 2024 12:26

gsoldevila commented Jul 12, 2024

View reviewed changes

gsoldevila force-pushed the kbn-usage-counters-search-api branch 4 times, most recently from f9c4b4f to f90542c Compare July 16, 2024 16:27

gsoldevila commented Jul 16, 2024

View reviewed changes

gsoldevila force-pushed the kbn-usage-counters-search-api branch 2 times, most recently from 8937136 to b185053 Compare July 17, 2024 07:42

pgayvallet approved these changes Jul 17, 2024

View reviewed changes

gsoldevila mentioned this pull request Jul 17, 2024

[Dashboards++] Usage counters enhancements #186530

Closed

3 tasks

afharo reviewed Jul 19, 2024

View reviewed changes

gsoldevila force-pushed the kbn-usage-counters-search-api branch from 54c6a12 to 0240e57 Compare July 19, 2024 15:35

gsoldevila and others added 11 commits August 1, 2024 13:57

UsageCountersService no longer registers the SO types

a64a986

Fix flaky test and buggy mock

f9db2c7

Refactor interfaces, extract CollectorSet off of UsageCollection plugin

f0bb528

Filter official counters from integration test

e1070ec

Move rollups logic and tests to usage_collection

ef65134

Extract search logic to separate module

d890d97

[CI] Auto-commit changed files from 'node scripts/lint_ts_projects --…

8f460ab

…fix'

Improve testing coverage

0ae774c

Default to sorting by _shard_doc when Pit is used

f4c9710

Use PIT search to retrieve usage counters

d8e86e4

Expose search() method in the stop() hook too

f58e23b

gsoldevila force-pushed the kbn-usage-counters-search-api branch from 0240e57 to f58e23b Compare August 1, 2024 11:57

afharo approved these changes Aug 1, 2024

View reviewed changes

gsoldevila added 4 commits August 1, 2024 18:00

Perform mutiple search requests (1 per domain), use bulkDelete

357b474

Keep deleting counters if there might be more of them

adcbf42

Undo interface changes

b8abddb

Fix outdated tests

c62399f

gsoldevila requested a review from a team as a code owner August 2, 2024 08:10

Fix IUsageCounter mocks for failing tests

9b9f007

gsoldevila requested a review from a team as a code owner August 2, 2024 10:27

afharo approved these changes Aug 2, 2024

View reviewed changes

davismcphee approved these changes Aug 5, 2024

View reviewed changes

gsoldevila enabled auto-merge (squash) August 5, 2024 09:01

Dosant approved these changes Aug 5, 2024

View reviewed changes

gsoldevila disabled auto-merge August 5, 2024 09:10

Merge branch 'main' into kbn-usage-counters-search-api

ea10c24

gsoldevila enabled auto-merge (squash) August 5, 2024 13:11

gsoldevila merged commit d9c1f97 into elastic:main Aug 5, 2024

Dosant mentioned this pull request Aug 14, 2024

Dashboard insights flyout with dashboard views #187993

Merged

Conversation

gsoldevila commented Jul 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

elasticmachine commented Jul 9, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gsoldevila Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pgayvallet left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gsoldevila Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

afharo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gsoldevila Jul 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gsoldevila Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

afharo Jul 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gsoldevila Jul 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dosant Jul 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

gsoldevila commented Jul 5, 2024 •

edited

Loading

gsoldevila Jul 16, 2024 •

edited

Loading

gsoldevila Jul 17, 2024 •

edited

Loading

gsoldevila Jul 19, 2024 •

edited

Loading

gsoldevila Aug 1, 2024 •

edited

Loading

afharo Jul 19, 2024 •

edited

Loading

gsoldevila Jul 19, 2024 •

edited

Loading

Dosant Jul 23, 2024 •

edited

Loading

gsoldevila Aug 2, 2024 •

edited

Loading

Dosant left a comment •

edited

Loading