Skip to content

Extend memory instrumentation to categorization #724

@droberts195

Description

@droberts195

Following elastic/elasticsearch#47516 there is much less slack in the "process overhead" added to the expected size of every ML C++ process. This will mean we could underestimate memory requirements quite badly in cases where features are used that are not instrumented at all, such at categorization. (Previously the relatively small memory usage of categorization would have easily been absorbed into the very generous 100MB process overhead.)

To prevent problems where categorization is in use the memory instrumentation of the autodetect process should be extended to categorization.

This is actually not as trivial as it sounds because the memory instrumentation code is currently in the model library but categorization is in the api library. So as part of this work the resource monitor class needs to be moved to the api library. This will also allow the possibility of instrumenting the outermost parts of anomaly detection as well as categorization.

Instrumenting the memory usage of categorization is also a realistic prerequisite for allowing per-partition categorization, because a high cardinality partition field combined with per-partition categorization could lead to categorization using a lot of memory. Therefore it makes sense to do the instrumentation first and only then consider adding per-partition categorization.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions