Skip to content

Support dynamic node role #2877

@ylwu-amzn

Description

@ylwu-amzn

OpenSearch doesn't support ML node role currently, check DiscoveryNodeRole.java, no ML role defined.

We need dedicated ML node for ML plugin

We have released a new ML plugin ml-commons in 1.3 and we are planning to adding more models.

ML model generally consuming more resources, especially for training process. We are going to support bigger ML models which might require more resources and special hardware like GPU. As OpenSearch doesn’t support ML node, we dispatch ML task to data node only. That means if user want to train some big model, they need to scale up all data nodes which seems costly and not reasonable. If we can support dedicated ML node, user don’t need to scale up their data node at all, just need to configure a new ML node(with different settings, more powerful instance type) and add it to cluster. And we can separate resource usage better by running ML task on dedicated node which can reduce impact to other tasks like search/ingestion.

What changes we need to make

OpenSearch

  • Add new node type ml in DiscoveryNodeRole.java
  • Support configuring ML node role in opensearch.yml
  • Security: should be ok as ML node communicate with other nodes via transport action.

[Updated solution]: We plan to support user configuring any node role name in opensearch.yml file. Node will read the unknown roles as dynamic role rather than break node start process. Dynamic role abbreviations will be the same with full name. To avoid confusion, we enforce the node role full name and abbreviation case-insensitive(must be lower-case). We will return all role's full name including both built-in and dynamic roles in a new field node.roles when user call _cat/nodes API. Check more details in discussions below.

ML plugin

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussIssues intended to help drive brainstorming and decision makingdistributed frameworkenhancementEnhancement or improvement to existing feature or requestfeatureNew feature or requestv2.1.0Issues and PRs related to version 2.1.0

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions