Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions python/ray/autoscaler/_private/resource_demand_scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@
)
from ray.core.generated.common_pb2 import PlacementStrategy

import random

logger = logging.getLogger(__name__)

# The minimum number of nodes to launch concurrently.
Expand Down Expand Up @@ -786,8 +788,8 @@ def get_nodes_for(
)
break

utilization_scores = sorted(utilization_scores, reverse=True)
best_node_type = utilization_scores[0][1]
weights = [node_types[node_type[1]].get("max_workers", 0) for node_type in utilization_scores]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the weights should be based on utilization_scores instead of max_workers: we don't want to launch a big machine for a 1 cpu task.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

best_node_type = random.choices(utilization_scores, weights=weights, k=1)[0][1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should remember the node type that has no availability and skip it next time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree that would be ideal but works without it. it quickly cycles through nodes so not saving much time vs the extra code for state management

nodes_to_add[best_node_type] += 1
if strict_spread:
resources = resources[1:]
Expand Down