Skip to content
46 changes: 45 additions & 1 deletion src/ray/protobuf/autoscaler.proto
Original file line number Diff line number Diff line change
Expand Up @@ -51,27 +51,71 @@ message PlacementConstraint {
optional AffinityConstraint affinity = 2;
}

// The type of operator to use for the label constraint.
enum LabelOperator {
LABEL_OPERATOR_UNSPECIFIED = 0;
// This is to support equality or in semantics.
LABEL_OPERATOR_IN = 1;
// This is to support not equal or not in semantics.
LABEL_OPERATOR_NOT_IN = 2;
Comment on lines +58 to +60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so exist or not exist are not supported at the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg. The pr lg. Will let Janet take a look as well

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, we can add in the future

}

// A node label constraint with a key, one or a list of values and an operator.
message LabelConstraint {
// The key of the label
string label_key = 1;
// The values to check against.
repeated string label_values = 2;
// The operator to use for the label constraint.
LabelOperator operator = 3;
}

// A list of node label constraints to be used for specify the label requirements in a
// resource request.
message LabelSelector {
// The list of node label constraints with AND semantics.
repeated LabelConstraint label_constraints = 1;
}

message ResourceRequest {
// resource requirements for the request.
map<string, double> resources_bundle = 1;
// placement constraint for the request. multiple constraints
// form AND semantics.
repeated PlacementConstraint placement_constraints = 2;
// The node label requirements for the request. Multiple label selectors are for
// fallback mechanism. When trying to find a node that satisfies the label
// requirements, the first label selector should be tried first, if not found,
// the second label selector should be tried, and so on.
repeated LabelSelector label_selectors = 3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would this work with coordinated fallback between resources and labels? That is, overriding resource requirements depending on the label selector that's selected.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't necessarily need to make the change to support this immediately if there is a path to support it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea is that if we want to do coordinated fallback between resources and labels, we can do similar things as the what we are doing with the placement group bundles:

  1. deprecate the current resources_bundle field
  2. add a new proto message ResourceSelector with 1 map<string, double> field
  3. add a new field with type repeated ResourceSelector in the current ResourceRequest proto message

In this way, the newly added resource_selectors and the existing label_selectors can work together to support the coordinated fallback mechanism. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me -- we should align the interfaces for PG and regular tasks/actors as much as possible

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should do it now, or can be added when we take on the work to do resource fallback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do that when we do the resource fallback work.

}

message ResourceRequestByCount {
ResourceRequest request = 1;
int64 count = 2;
}

// All bundles in the same resource request require gang
// A bundle selector used to specify the resource bundles that should be
// allocated together. All bundles in the same resource request require gang
// allocation semantics: they should be allocated all or nothing.
message BundleSelector {
// The list of resource requests that should be allocated together.
repeated ResourceRequest resource_requests = 1;
}
Comment on lines +104 to +107
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an example of how this would be used for fallback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the note about the fallback mechanism for GangResourceRequest is added inside the comment in GangResourceRequest: https://github.com/ray-project/ray/pull/51771/files#diff-79033515bd7d0b4d966220d9c45290447e0aac8afd71b048a817b0d9bdfaa052R114-R117

  // The bundle requests. Multiple bundle selectors are for fallback mechanism.
  // When trying to find nodes that satisfies the bundle selector, the first bundle
  // selector should be tried first, if not found, the second bundle selector should be
  // tried, and so on.

I also just added the more notes in ResourceRequest indicating that in bundle selector case, the fallback will be done on the bundle selector level and in resource request, there should only be 1 label selector.

Wondering if this is something you were looking for?


message GangResourceRequest {
// a map from bundles to the number of bundles requested.
// DEPRECATED: bundle_selector should be used instead so that we can support fallback
// mechanism.
repeated ResourceRequest requests = 1;
// Metadata associated with the request for observability,
// e.g. placement group's strategy.
string details = 2;
// The bundle requests. Multiple bundle selectors are for fallback mechanism.
// When trying to find nodes that satisfies the bundle selector, the first bundle
// selector should be tried first, if not found, the second bundle selector should be
// tried, and so on.
repeated BundleSelector bundle_selectors = 3;
}

// Cluster resource constraint represents minimal cluster size requirement,
Expand Down