Extend memory instrumentation to categorization

Following elastic/elasticsearch#47516 there is much less slack in the "process overhead" added to the expected size of every ML C++ process.  This will mean we could underestimate memory requirements quite badly in cases where features are used that are not instrumented at all, such at categorization.  (Previously the relatively small memory usage of categorization would have easily been absorbed into the very generous 100MB process overhead.)

To prevent problems where categorization is in use the memory instrumentation of the `autodetect` process should be extended to categorization.

This is actually not as trivial as it sounds because the memory instrumentation code is currently in the `model` library but categorization is in the `api` library.  So as part of this work the resource monitor class needs to be moved to the `api` library.  This will also allow the possibility of instrumenting the outermost parts of anomaly detection as well as categorization.

Instrumenting the memory usage of categorization is also a realistic prerequisite for allowing per-partition categorization, because a high cardinality partition field combined with per-partition categorization could lead to categorization using a lot of memory.  Therefore it makes sense to do the instrumentation first and only then consider adding per-partition categorization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend memory instrumentation to categorization #724

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extend memory instrumentation to categorization #724

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions