[Security Solution][Detections] Adoption telemetry#71102
[Security Solution][Detections] Adoption telemetry#71102rylnd merged 15 commits intoelastic:masterfrom
Conversation
This uses ML and raw ES calls to query our ML Jobs and Rules, and parse them into a format to be consumed by telemetry. Still to come: * initialization * tests
The service seems to convert colons to underscores, so let's just use an underscure.
This allows us to test our adherence to the collector API, focusing particularly on the fetch function.
We're going to have our usage data under one key corresponding to the app, so this nests the existing data under a 'detections' key while allowing another fetching function to be plugged into the main collector under a separate key.
e73b13c to
bbae701
Compare
|
@elasticmachine merge upstream |
| filterPath: ['hits.hits._source.alert.enabled', 'hits.hits._source.alert.tags'], | ||
| ignoreUnavailable: true, | ||
| index, | ||
| size: 10000, // elasticsearch index.max_result_window default value |
There was a problem hiding this comment.
Are we ever likely to exceed this limit?
There was a problem hiding this comment.
No, it doesn't seem likely that a user'd have 10k rules, but this ensures that if they did, we'd tally the first 10k instead of receiving an error and returning 0s.
There was a problem hiding this comment.
This was used in another collector, but if you think there's risk here I'm happy to remove it!
There was a problem hiding this comment.
I think it's fine. And I hope they don't have more than 10k rules 😅. I was really just curious if there was a way beyond it beyond scroll. I ran into an issue previously where I needed to set this max just to get all of the data I wanted (I think ES has a default max it returns?), but it looks fine.
* inlines collector options * inlines schema object * makes DetectionsUsage an interface instead of a type alias
Conflicts: x-pack/plugins/security_solution/server/plugin.ts
|
Pinging @elastic/siem (Team:SIEM) |
| config: { | ||
| job_type: 'anomaly_detector', | ||
| description: | ||
| 'SIEM Auditbeat: Looks for unusual destination port activity that could indicate command-and-control, persistence mechanism, or data exfiltration activity (beta)', |
There was a problem hiding this comment.
Just checking we haven't migrated these detections to the SecuritySolution naming pattern?
There was a problem hiding this comment.
I took these from live data and the rename hasn't happened yet. I'll be sure to update these appropriately once that happens 👍
| ); | ||
|
|
||
| if (ruleResults.hits?.hits?.length > 0) { | ||
| ruleMetrics = ruleResults.hits.hits.map((hit) => ({ |
There was a problem hiding this comment.
Not really an issue, but why not do the tally here rather than iterating twice? Or alternatively, just return ruleResults and then just iterate once in the usageBuilder
There was a problem hiding this comment.
That's a good point, I'll change these to do a single loop 👍
| const moduleJobs = modules.flatMap((module) => module.jobs); | ||
| const jobs = await jobServiceProvider(mlCaller).jobsSummary(['siem']); | ||
|
|
||
| jobMetrics = jobs.map((job) => ({ |
There was a problem hiding this comment.
Same thing here, but not sure how much of an issue it'll be to loop over all this data twice. I know we have about 60s till the usageCollector times out, but I don't think we'll run into that
We were previously performing two loops over each set of data: one to format it down to just the data we need, and another to convert that into usage data. We now perform both steps within a single loop.
| const jobs = await jobServiceProvider(mlCaller).jobsSummary(['siem']); | ||
|
|
||
| jobsUsage = jobs.reduce((usage, job) => { | ||
| const isElastic = moduleJobs.some((moduleJob) => moduleJob.id === job.id); |
There was a problem hiding this comment.
does SIEM apply a prefix to the module when initially setting it up?
If so, this won't match and you'll have to do a partial match on the last part of job id.
There was a problem hiding this comment.
Confirmed that this is not currently an issue, but good to know moving forward.
afharo
left a comment
There was a problem hiding this comment.
Kibana Telemetry changes LGTM!
| detections: { | ||
| detection_rules_custom_enabled: { type: 'number' }, | ||
| detection_rules_custom_disabled: { type: 'number' }, | ||
| detection_rules_elastic_enabled: { type: 'number' }, | ||
| detection_rules_elastic_disabled: { type: 'number' }, | ||
| ml_jobs_custom_enabled: { type: 'number' }, | ||
| ml_jobs_custom_disabled: { type: 'number' }, | ||
| ml_jobs_elastic_enabled: { type: 'number' }, | ||
| ml_jobs_elastic_disabled: { type: 'number' }, | ||
| }, |
There was a problem hiding this comment.
NIT: Can we use type: 'long' instead (or 'float' if that's the intended format)?
This will help us understand the type of number to be indexed 🙂
There was a problem hiding this comment.
NIT 2: Have you considered using a tree structure instead?
{
"properties": {
"detections": {
"properties": {
"detection_rules": {
"custom": {
"enabled": { "type": "long" },
"disabled": { "type": "long" }
},
"elastic": {
"enabled": { "type": "long" },
"disabled": { "type": "long" }
}
},
"ml_jobs": {
"custom": {
"enabled": { "type": "long" },
"disabled": { "type": "long" }
},
"elastic": {
"enabled": { "type": "long" },
"disabled": { "type": "long" }
}
}
}
}
}
}|
@elasticmachine merge upstream |
💚 Build SucceededBuild metrics
History
To update your PR or re-run it, just comment with: |
* style: sort plugin interface * WIP: UsageCollector for Security Adoption This uses ML and raw ES calls to query our ML Jobs and Rules, and parse them into a format to be consumed by telemetry. Still to come: * initialization * tests * Initialize usage collectors during plugin setup * Rename usage key The service seems to convert colons to underscores, so let's just use an underscure. * Collector is ready if we have a kibana index * Refactor collector to generate options in a function This allows us to test our adherence to the collector API, focusing particularly on the fetch function. * Refactor usage collector in anticipation of endpoint data We're going to have our usage data under one key corresponding to the app, so this nests the existing data under a 'detections' key while allowing another fetching function to be plugged into the main collector under a separate key. * Update our collector to satisfy telemetry tooling * inlines collector options * inlines schema object * makes DetectionsUsage an interface instead of a type alias * Extracts telemetry mappings via scripts/telemetry_extract * Refactor detections usage logic to perform one loop instead of two We were previously performing two loops over each set of data: one to format it down to just the data we need, and another to convert that into usage data. We now perform both steps within a single loop. * Refactor detections telemetry to be nested * Extract new nested detections telemetry mappings Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
|
Pinging @elastic/security-solution (Team: SecuritySolution) |
Summary
This adds a new UsageCollector that collects adoption data for both Detection Rules and ML Jobs within our solution's group.
The metrics being collected are:
TODO:
Checklist
Delete any items that are not applicable to this PR.
For maintainers