[SLO] Fix issue where filters do not apply to overview stats#234218
[SLO] Fix issue where filters do not apply to overview stats#234218baileycash-elastic merged 31 commits intoelastic:mainfrom
Conversation
|
/ci |
93770dc to
e13e1e5
Compare
e13e1e5 to
70e30f2
Compare
cc6cbaa to
82a4035
Compare
|
/ci |
|
Pinging @elastic/obs-ux-management-team (Team:obs-ux-management) |
x-pack/solutions/observability/plugins/slo/server/services/get_slo_stats_overview.ts
Outdated
Show resolved
Hide resolved
…_slo_stats_overview.ts Co-authored-by: Shahzad <shahzad31comp@gmail.com>
justinkambic
left a comment
There was a problem hiding this comment.
Just a first pass code review on the things that grabbed my eye right away. I'm going to come back to this later, just pressed for time on finishing the review right now.
x-pack/solutions/observability/plugins/slo/server/services/get_slo_stats_overview.ts
Outdated
Show resolved
Hide resolved
| const filters = params.filters ?? ''; | ||
| const kqlQuery = params?.kqlQuery ?? ''; | ||
| const filters = params?.filters ?? ''; | ||
| const parsedFilters = parseStringFilters(filters, this.logger); |
There was a problem hiding this comment.
This parseStringFilters function is not great, because it is reverting TypeScript to typeless JS wild west style programming. But it's also doing a JSON.parse, which is what you do further down, so we don't need to do any JSON.parse(params?.filters).
Ideally there'd be some kind of type checking or at least casting to the expected values within parseStringFilters, but it's already used throughout the server code and is out of scope for refactoring in this PR.
x-pack/solutions/observability/plugins/slo/server/services/get_slo_stats_overview.ts
Outdated
Show resolved
Hide resolved
x-pack/solutions/observability/plugins/slo/server/services/get_slo_stats_overview.ts
Outdated
Show resolved
Hide resolved
💚 Build Succeeded
Metrics [docs]Saved Objects .kibana field count
History
|
|
@PhilippeOberti ohhh yes 😅it's fixed now |
| if (querySLOsForIds) { | ||
| do { | ||
| const sloIdCompositeQueryResponse = await this.scopedClusterClient.asCurrentUser.search({ | ||
| index: '.slo-observability.summary-*', |
There was a problem hiding this comment.
i think this shouldn't be hardcoded. use return value from getSummaryIndices
There was a problem hiding this comment.
should be good to go here
justinkambic
left a comment
There was a problem hiding this comment.
First of all, thank you for writing an excellent battery of unit tests, it made it very easy to experiment with recommended changes.
I have a few areas of this code I want to change. I think we can significantly improve the readability and reduce the number of vars we need to declare to get this work done. I may want to change a few things yet, let's see the output from this round. Check my recommendations and LMK if you disagree with any of the points.
If it is too hard to track what I'm looking for by looking at the comments in the PR diff, I pushed a version of your branch to my remote that includes a commit with everything I mentioned here.
| /* | ||
| If we know there are no SLOs that match the provided filters, we can skip querying for rules and alerts | ||
| */ | ||
| const [rules, alerts] = await Promise.all( |
There was a problem hiding this comment.
I think given how much is happening already within this function, it makes logical sense to pull this block out into a separate function we can call. Something like:
private async fetchRulesAndAlerts({
querySLOsForIds,
sloRuleQueryKeys,
ruleFilters,
alertFilters,
}: {
querySLOsForIds: boolean;
sloRuleQueryKeys: string[];
ruleFilters?: KueryNode;
alertFilters?: QueryDslQueryContainer[];
}) {
return await Promise.all(
querySLOsForIds && sloRuleQueryKeys.length === 0
? [
{
total: 0,
},
{
activeAlertCount: 0,
recoveredAlertCount: 0,
},
]
: [
this.rulesClient.find({
options: {
ruleTypeIds: SLO_RULE_TYPE_IDS,
consumers: [
AlertConsumers.SLO,
AlertConsumers.ALERTS,
AlertConsumers.OBSERVABILITY,
],
...(ruleFilters ? { filter: ruleFilters } : {}),
},
}),
this.racClient.getAlertSummary({
ruleTypeIds: SLO_RULE_TYPE_IDS,
consumers: [AlertConsumers.SLO, AlertConsumers.ALERTS, AlertConsumers.OBSERVABILITY],
gte: moment().subtract(24, 'hours').toISOString(),
lte: moment().toISOString(),
...(alertFilters?.length
? {
filter: alertFilters,
}
: {}),
}),
]
);
}The call signature will remain the same.
| if (buckets && buckets.length > 0) { | ||
| alertFilterTerms = alertFilterTerms.concat( | ||
| ...buckets.map((bucket) => { | ||
| sloRuleQueryKeys.push(bucket.key.sloId); | ||
| return { | ||
| bool: { | ||
| must: [ | ||
| { term: { 'kibana.alert.rule.parameters.sloId': bucket.key.sloId } }, | ||
| ...(instanceIdIncluded | ||
| ? [ | ||
| { | ||
| term: { | ||
| 'kibana.alert.instance.id': bucket.key.sloInstanceId, | ||
| }, | ||
| }, | ||
| ] | ||
| : []), | ||
| ], | ||
| }, | ||
| }; | ||
| }) | ||
| ); | ||
| } |
There was a problem hiding this comment.
I think it is worth extracting this to its own processing function.
private processSloQueryBuckets(
buckets: Array<{ key: { sloId: string; sloInstanceId: string } }>,
sloRuleQueryKeys: string[],
instanceId?: string
): QueryDslQueryContainer[] {
return buckets.map((bucket) => {
// alternatively, return `bucket.key.sloId` and add it to the `sloRuleQueryKeys` in the loop,
// this way we get rid of one of the params here
sloRuleQueryKeys.push(bucket.key.sloId);
return {
bool: {
must: [
{ term: { 'kibana.alert.rule.parameters.sloId': bucket.key.sloId } },
...(instanceId
? [
{
term: {
'kibana.alert.instance.id': bucket.key.sloInstanceId,
},
},
]
: []),
],
},
};
});
}Then the code in your loop becomes much simpler to understand:
afterKey = this.getAfterKey(sloIdCompositeQueryResponse.aggregations?.sloIds);
const buckets = (
sloIdCompositeQueryResponse.aggregations?.sloIds as {
buckets?: Array<{ key: { sloId: string; sloInstanceId: string } }>;
}
)?.buckets;
if (buckets) {
alertFilterTerms.push(
...this.processSloQueryBuckets(buckets, sloRuleQueryKeys, instanceId)
);
}
} while (afterKey);You could even parallelize this by making it async and dumping the promises in an async queue that lives outside the loop so you don't delay subsequent calls to the DB. Then, further down when it's time to use sloRuleQueryKeys and alertFilterTerms you can just await the promise and pick up all the items in the array. I didn't write this code because it's probably over-optimizing.
| import { getSummaryIndices, getSloSettings } from './slo_settings'; | ||
| import { getElasticsearchQueryOrThrow, parseStringFilters } from './transform_generators'; | ||
|
|
||
| const ES_PAGESIZE_LIMIT = 5000; |
There was a problem hiding this comment.
Why did we pick 5000 for the bucket limit for the query? If there's no reason we may want to reduce this to 1000, per the docs for the composite agg:
If all composite buckets should be retrieved it is preferable to use a small size (100 or 1000 for instance) and then use the after parameter to retrieve the next results.
Given this won't change any aspect of the implementation, if there's no real reason for the 5k limit we may want to reduce this to 1k.
| let alertFilters: QueryDslQueryContainer[] = []; | ||
| let alertFilterTerms: QueryDslQueryContainer[] = []; | ||
| let afterKey: AggregationsAggregate | undefined; | ||
| let ruleFilters: KueryNode | undefined; |
There was a problem hiding this comment.
With the changes I recommended below it becomes possible to have only afterKey declared as a let, and we can delay the declaration of alertFilters and ruleFilters until after the loop finishes.
const alertFilterTerms: QueryDslQueryContainer[] = [];
let afterKey: AggregationsAggregate | undefined;
| const resultNodes = nodeBuilder.or( | ||
| sloRuleQueryKeys.map((sloId) => nodeBuilder.is(`alert.attributes.params.sloId`, sloId)) | ||
| ); | ||
|
|
||
| ruleFilters = resultNodes; | ||
| alertFilters = [ | ||
| { | ||
| bool: { | ||
| should: [...alertFilterTerms], | ||
| }, | ||
| }, | ||
| ]; | ||
| } | ||
| } catch (error) { | ||
| this.logger.error(`Error querying SLOs for IDs: ${error}`); | ||
| throw error; | ||
| } |
There was a problem hiding this comment.
I recommend pulling these blocks out of the loop. It will simplify and you're already adding these values to a list. Include a default short-circuit when there are no values retrieved in each of the declarations and we can keep this closer to the top level of the function.
const ruleFilters: KueryNode | undefined =
sloRuleQueryKeys.length > 0
? nodeBuilder.or(
sloRuleQueryKeys.map((sloId) => nodeBuilder.is(`alert.attributes.params.sloId`, sloId))
)
: undefined;
const alertFilters =
alertFilterTerms.length > 0
? [
{
bool: {
should: [...alertFilterTerms],
},
},
]
: [];
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]
History
|
|
Starting backport for target branches: 8.19, 9.1, 9.2 https://github.com/elastic/kibana/actions/runs/18694598688 |
💔 Some backports could not be created
Note: Successful backport PRs will be merged automatically after passing CI. Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |



Summary
Closes #233631
This PR aims to fix inconsistencies with the SLO overview stats where the alerts and rules count would not represent filtered SLO results, unlike healthy vs violated count.
Demo
Screen.Recording.2025-09-10.at.4.54.44.PM.mov