[Usage Counters] Enhancements to the APIs#187665
Conversation
9596133 to
e2eb96d
Compare
usage-counters
e2eb96d to
9385678
Compare
usage-counters|
Pinging @elastic/kibana-core (Team:Core) |
c99c82a to
f0904a4
Compare
There was a problem hiding this comment.
Why do we need this internal Core module import (@kbn/core-saved-objects-api-server-internal)? Is that only to stub getCurrentTime during the integration tests?
There was a problem hiding this comment.
Yes I'm afraid so, it bothers me too.
Perhaps I can try to find a cleaner way to insert old counters for testing purposes.
There was a problem hiding this comment.
In f90542c I used the standard incrementCounter to create the counters, and then I used esClient.updateByQuery() to modify their updated_at dates. Not ideal, but cleaner than the mock IMO, and addresses your feedback above.
a115044 to
dfd4335
Compare
f9c4b4f to
f90542c
Compare
There was a problem hiding this comment.
I moved the rollups logic (aka delete counters older than 5 days) to the usage_collection plugin, IMO it makes more sense to have it there:
- It is specific of usage counters,
kibana_usage_collectionhandles plenty of other collectors. - If someone disables
kibana_usage_collection, usage counters would be captured and persisted indefinitely.
WDYT?
8937136 to
b185053
Compare
pgayvallet
left a comment
There was a problem hiding this comment.
Implementation looks fine to me, but I'm not the one with the most knowledge of this area, so a second review would probably make sense
There was a problem hiding this comment.
NIT: we're not following our service pattern here, this method shouldn't be public / used directly. But it's not really that significant.
There was a problem hiding this comment.
++ ATM I am working on some enhancements that include making this private.
UPDATE: Addressed with fe9fb62
afharo
left a comment
There was a problem hiding this comment.
Overall LGTM. I added a few comments that I'd like to discuss before approving.
There was a problem hiding this comment.
nit: how about returning an empty response instead? I wonder if throwing this error will create the need for plugins to check if it's enabled (and we don't provide an API to share if it's enabled or not).
Alternatively, we could allow searching, only that, when enabled: false, we don't store more data. WDYT?
There was a problem hiding this comment.
The problem is I only obtain the search method when calling the service.start(). Either:
- I call start (and we'll have a few RxJS timers running without buffering any events)
- Or I make the
searchmethod public (we break consistency with other services). - Or I expose the
searchin the response of thestop()hook.
There was a problem hiding this comment.
I chose the 3rd option. Fixed in f58e23b
There was a problem hiding this comment.
hmm, I don't think this approach is valid anymore...
If we have 12 dashboards viewed every day during the last 90 days, they'll show up in the first page and we won't remove others...
I wonder if we should change this to group per retention days (domainId: 'dashboard' and updated_at < retentionPeriodDays vs. not domainId: 'dashboard' and updated_at < USAGE_COUNTERS_KEEP_DOCS_FOR_DAYS).
There was a problem hiding this comment.
Good catch! we'll have to think about a better strategy.
There was a problem hiding this comment.
After discussion, we can do a search by domainId, and filter by updated_at < (now - retentionPeriodDays).
We can then loop through the different domain IDs.
Will assess bulkDelete following @TinaHeiligers's latest comment.
There was a problem hiding this comment.
nit: we can do it in a follow up: but we can now use the bulkDelete API :)
There was a problem hiding this comment.
We'll see about that, cause I believe the bulkDelete does not allow deleting from multiple namespaces.
There was a problem hiding this comment.
It will when you add 'force'. Please check that the option hasn't changed.
There was a problem hiding this comment.
nit: I don't see us defaulting it anywhere here.
How about defaulting retentionPeriodDays = USAGE_COUNTERS_KEEP_DOCS_FOR_DAYS if not provided?
There was a problem hiding this comment.
It's defaulted in the rollup logic, this way the in-memory UsageCounter's are lighter (no need to store the property if it matches the default value).
There was a problem hiding this comment.
IMO, aggregations might better achieve what we're after here.
I imagine an API like:
usageCounters.search({
filters: {
domainId,
counterName,
counterType,
source,
timestamp: { [lt|lte|gt|gte]: Date }
},
aggregation_keys: [
'domainId',
'counterName',
'counterType',
'source',
'timestamp'
]
});
That we internally map to an aggregation call and return flattened.
So... for a counter with this structure:
{
"domainId": "dashboard",
"counterName": "<dashboard-id>",
"counterType": "views",
"source": "server"
}- If I only care about the grand total of my N-day retention period for all my dashboards, I'd call the API like
usageCounters.search({
filters: {
domainId: "dashboard",
counterType: "views",
source: "server",
},
aggregation_keys: []
});And I'd get a response like
{
counters: [
{
domainId: "dashboard",
counterType: "views",
source: "server",
count: 9999999,
}
]
}- If I want the grand total for each dashboard, I'd call the API like
usageCounters.search({
filters: {
domainId: "dashboard",
counterType: "views",
source: "server",
},
aggregation_keys: [
'counterName'
]
});And I'd get a response like
{
counters: [
{
domainId: "dashboard",
counterType: "views",
source: "server",
counterName: "dashboard-1"
count: 10,
},
{
domainId: "dashboard",
counterType: "views",
source: "server",
counterName: "dashboard-2"
count: 9999989,
}
]
}- If I want the histogram for a specific dashboard and time range, I'd call the API like
usageCounters.search({
filters: {
domainId: "dashboard",
counterType: "views",
counterName: "dashboard-2",
source: "server",
timestamp: {
gte: "2024-07-01T00:00:00.000Z",
lte: "2024-07-05T00:00:00.000Z",
}
},
aggregation_keys: [
'timestamp'
]
});And I'd get a response like
{
counters: [
{
domainId: "dashboard",
counterType: "views",
counterName: "dashboard-2"
source: "server",
timestamp: "2024-07-01T00:00:00.000Z"
count: 10,
},
{
domainId: "dashboard",
counterType: "views",
counterName: "dashboard-2"
source: "server",
timestamp: "2024-07-03T00:00:00.000Z"
count: 9999989,
}
]
}The benefits of the aggregations are:
- No pagination required
- Can be requested at any level
WDYT?
There was a problem hiding this comment.
It's an interesting proposal!
In the current use case, Anton still has to retrieve individual days counters to show them in the UI, so I went for the simplest approach possible, and let him aggregate counts on his side (saving 1 call in the process).
But I agree that it seems desirable to perform the aggregations on our side long term, makes for a more elegant API. Regarding pagination, when retrieving individual results you might still have plenty, but we currently circumvent this by allowing the from: string parameter. This way we can filter and only get counters that are more recent than a certain date (e.g. now - 90d).
Let's discuss this offline!
There was a problem hiding this comment.
Regarding pagination, when retrieving individual results you might still have plenty, but we currently circumvent this by allowing the
from: stringparameter. This way we can filter and only get counters that are more recent than a certain date (e.g.now - 90d).
AFAIK, the recommended way to paginate is via PIT for various reasons:
- The
from+sizetechnique is limited to 10_000 entries: https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html (no matter if they are 10 pages of 1000 or 1000 pages of 10) - If updates occur in the process, the list will be resorted, so getting the 2nd+ pages may return previously fetched documents if new documents are indexed during the pagination.
AFAIK, we'll always retrieve all values queried, since the intention is not to show these values in a table that the user can paginate. So I don't think we're saving ourselves from any potential issues.
Anton still has to retrieve individual days counters to show them in the UI
@Dosant, just FMI, will you retrieve all days for the histogram, and add them up to get the total? Or will the histogram be of the last 7 days and the total will account for the entire retention period (30d? 90d?)
Let's discuss this offline!
Sure! Happy to meet when you're back :)
54c6a12 to
0240e57
Compare
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]Public APIs missing comments
Public APIs missing exports
History
|
0240e57 to
f58e23b
Compare
|
|
||
| export function createCloudUsageCollector( | ||
| usageCollection: UsageCollectionSetup, | ||
| usageCollection: ICollectorSet, |
There was a problem hiding this comment.
nit: I think this is still UsageCollectionSetup. It's getting the entire plugin contract.
Prob what triggered is the name ICollectorSet. It sounds a bit confusing for a CollectorSet to be needed to create a usage collector...
WDYT?
There was a problem hiding this comment.
I thought about that for a while, and I think the root problem is "naming is hard".
IMO it would make more sense to name it CollectorManager instead of ICollectorSet.
Then, UsageCollectionSetup would have both the CollectorManager & UsageCountersServiceSetup.
NB the code above does not need anything about the UsageCounters.
The problem is that lots of plugins are already using the global UsageCollectionSetup, so changing all the references to the more specific CollectorManager would take some work + codeowners approvals.
I only changed this one cause it falls under our ownership.
I can rollback these changes and we can tackle this on a separate PR.
UPDATE: rolled back with b8abddb
| if (toDelete.length === ROLLUP_BATCH_SIZE) { | ||
| // we found a lot of old Usage Counters, put the counter back in the queue, as there might be more | ||
| counterQueue.push(counter); | ||
| } |
davismcphee
left a comment
There was a problem hiding this comment.
Data Discovery changes LGTM
There was a problem hiding this comment.
x-pack/plugins/reporting/server/routes/ change due to interface changes lgtm.
I haven't rebased my frontend work yet to check if everything is still looking good, I'll try today or tomorrow. but don't want to block, up to you if you'd like to wait Looks good!
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Public APIs missing comments
Public APIs missing exports
History
To update your PR or re-run it, just comment with: |
Summary
Part of #186530
Follow-up of #187064
The goal of this PR is to provide the necessary means to allow implementing the Counting views part of the Dashboards++ initiative.
We do this by extending the capabilities of the usage counters APIs: