Skip to content

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Jun 2, 2025

This PR adds a new setting to toggle the collection for shard heap usages as well as wiring ShardHeapUsage into ClusterInfoSimulator.

Relates: #128723

@elasticsearchmachine elasticsearchmachine added v9.1.0 serverless-linked Added by automation, don't add manually labels Jun 2, 2025
@ywangd ywangd added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Jun 2, 2025
@ywangd ywangd requested a review from nicktindall June 12, 2025 09:59
@ywangd ywangd changed the title Stub a heap usage field on ClusterInfo Setting for estimated shard heap allocation decider Jun 12, 2025
@ywangd ywangd marked this pull request as ready for review June 12, 2025 10:02
@ywangd ywangd requested a review from a team as a code owner June 12, 2025 10:02
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Jun 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

}
} else {
logger.trace("skipping collecting shard heap usage from cluster, notifying listeners with empty shard heap usage");
shardHeapUsagePerNode = Map.of();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This complexity looks a bit nasty, but I can't think of a way to make it nicer.

I wonder if it would look better if we made the decisions up front e.g.

var fetchIndicesStats = diskThresholdEnabled;
var fetchNodeStates = diskThresholdEnabled || shardThresholdEnabled;
var fetchShardHeapUsage = shardThresholdEnaled;

probably wouldn't make much difference.

What about if we had something like...

enum Requirement {
   NodeStats,
   IndicesStats,
   ShardHeapUsage
}

record Feature(Set<Requirement> requirements) {
}

Feature DISK_THRESHOLD = new Feature(Set.of(NodeStats, IndicesStats));
Feature SHARD_HEAP = new Feature(Set.if(NodeStats, ShardHeapUsage));

// then

    // ...
    Set<Requirement> requirements = enabledFeatures.stream().flatMap(feat -> feat.requirements().stream()).toSet();
    // ...
    if (requirements.contains(NodeStats) {
       try (var ignored = threadPool.getThreadContext().clearTraceContext()) {
             fetchNodeStats();
       }
    }

You could argue its overkill now, but if we keep doing this stuff it'll make things easier to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the 2nd suggestion. At this point, it seems adding to the complexity, e.g. it still needs the three if/else statements at the end. Maybe it can be better hidden if Feature somehow handles the fetches internally. But that seems a bit too much for what we need here. So I pushed d3dd95a to extract each if/else branch into its own method. This is somewhat aligned with your first suggestion. Though it does not reduce the complexity, I feel the readability is better since it seems clearer about when a particular stats need to be fetched. Please let me know if that works for you. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that improves the readability, thanks.

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some minor nitpicking

@ywangd ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jun 17, 2025
@elasticsearchmachine elasticsearchmachine merged commit adf4d10 into elastic:main Jun 17, 2025
24 checks passed
@ywangd ywangd deleted the ES-11445-cluster-info-stub branch June 17, 2025 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue serverless-linked Added by automation, don't add manually Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants