Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation for NAS job in Katib controller #398

Merged
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions cmd/manager/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,24 @@ func (s *server) GetSuggestions(ctx context.Context, in *api_pb.GetSuggestionsRe
return r, nil
}

func (s *server) ValidateSuggestionParameters(ctx context.Context, in *api_pb.ValidateSuggestionParametersRequest) (*api_pb.ValidateSuggestionParametersReply, error) {
if in.SuggestionAlgorithm == "" {
return &api_pb.ValidateSuggestionParametersReply{}, errors.New("No suggest algorithm specified")
}

conn, err := grpc.Dial("vizier-suggestion-"+in.SuggestionAlgorithm+":6789", grpc.WithInsecure())
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should return specific error code here (e.g. Unavailable ) and set message.

Copy link
Member Author

@andreyvelich andreyvelich Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YujiOshima By default this Error https://github.com/kubeflow/katib/pull/398/files#diff-9e63f7b0791edcb4a53c944552b88970R120 will return Unavailable code. I think, we don't need to specify this error code there.
Right now, I handle this error here, as you said: b88030b#diff-7a381fdaeae1295724b74c0a04f7644cR386.
Please check, if it is ok.

return &api_pb.ValidateSuggestionParametersReply{}, err
}
defer conn.Close()

c := api_pb.NewSuggestionClient(conn)

r, err := c.ValidateSuggestionParameters(ctx, in)

return r, err
}

func (s *server) RegisterWorker(ctx context.Context, in *api_pb.RegisterWorkerRequest) (*api_pb.RegisterWorkerReply, error) {
wid, err := dbIf.CreateWorker(in.Worker)
return &api_pb.RegisterWorkerReply{WorkerId: wid}, err
Expand Down
511 changes: 318 additions & 193 deletions pkg/api/api.pb.go

Large diffs are not rendered by default.

46 changes: 46 additions & 0 deletions pkg/api/api.pb.gw.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

22 changes: 22 additions & 0 deletions pkg/api/api.proto
Original file line number Diff line number Diff line change
Expand Up @@ -212,10 +212,20 @@ service Manager {
get: "/api/Manager/GetSavedModels"
};
};
/**
* Validate Suggestion Parameters from Study Job.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain a suggestion service should return INVALID_ARGUMENT error when the parameter is invalid.

*/
rpc ValidateSuggestionParameters(ValidateSuggestionParametersRequest) returns (ValidateSuggestionParametersReply){
option (google.api.http) = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find that ValidateSuggestionParametersReply is used as return &api.ValidateSuggestionParametersReply{} all the time. can we drop ValidateSuggestionParametersReply, alway return nothing?
or you can use below struct to replace current one

message ValidateSuggestionParametersReply {
    bool is_valid 1;
    string msg 2;
 }

Copy link
Member Author

@andreyvelich andreyvelich Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hougangliu We don't need to return anything, like here:
https://github.com/kubeflow/katib/blob/master/cmd/manager/main.go#L213.
If parameter is not valid, we return INVALID_ARGUMENT Error message.

post: "/api/Manager/ValidateSuggestionParameters"
body: "*"
};
};
}

service Suggestion {
rpc GetSuggestions(GetSuggestionsRequest) returns (GetSuggestionsReply);
rpc ValidateSuggestionParameters(ValidateSuggestionParametersRequest) returns (ValidateSuggestionParametersReply);
}

service EarlyStopping {
Expand Down Expand Up @@ -784,3 +794,15 @@ message EarlyStoppingParameterSet {
message GetEarlyStoppingParameterListReply {
repeated EarlyStoppingParameterSet early_stopping_parameter_sets = 1;
}

message ValidateSuggestionParametersRequest {
StudyConfig study_config = 1;
string suggestion_algorithm = 2;
repeated SuggestionParameter suggestion_parameters = 3;
}

/**
* Return Invalid Argument Error if Suggestion Parameters are not Valid
*/
message ValidateSuggestionParametersReply {
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool is_valid 1;
string msg 2;

48 changes: 48 additions & 0 deletions pkg/api/api.swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -670,6 +670,33 @@
"Manager"
]
}
},
"/api/Manager/ValidateSuggestionParameters": {
"post": {
"summary": "* \nValidate Suggestion Parameters from Study Job.",
"operationId": "ValidateSuggestionParameters",
"responses": {
"200": {
"description": "",
"schema": {
"$ref": "#/definitions/apiValidateSuggestionParametersReply"
}
}
},
"parameters": [
{
"name": "body",
"in": "body",
"required": true,
"schema": {
"$ref": "#/definitions/apiValidateSuggestionParametersRequest"
}
}
],
"tags": [
"Manager"
]
}
}
},
"definitions": {
Expand Down Expand Up @@ -1486,6 +1513,27 @@
},
"description": "* \nUpdate a Status of Worker."
},
"apiValidateSuggestionParametersReply": {
"type": "object",
"title": "*\nReturn 1 if Suggestion Parameters is Valid, 0 if not"
},
"apiValidateSuggestionParametersRequest": {
"type": "object",
"properties": {
"study_config": {
"$ref": "#/definitions/apiStudyConfig"
},
"suggestion_algorithm": {
"type": "string"
},
"suggestion_parameters": {
"type": "array",
"items": {
"$ref": "#/definitions/apiSuggestionParameter"
}
}
}
},
"apiWorker": {
"type": "object",
"properties": {
Expand Down
31 changes: 31 additions & 0 deletions pkg/api/gen-doc/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@
- [Trial](#api.Trial)
- [UpdateWorkerStateReply](#api.UpdateWorkerStateReply)
- [UpdateWorkerStateRequest](#api.UpdateWorkerStateRequest)
- [ValidateSuggestionParametersReply](#api.ValidateSuggestionParametersReply)
- [ValidateSuggestionParametersRequest](#api.ValidateSuggestionParametersRequest)
- [Worker](#api.Worker)
- [WorkerFullInfo](#api.WorkerFullInfo)

Expand Down Expand Up @@ -1340,6 +1342,33 @@ Update a Status of Worker.



<a name="api.ValidateSuggestionParametersReply"/>

### ValidateSuggestionParametersReply
Return 1 if Suggestion Parameters is Valid, 0 if not






<a name="api.ValidateSuggestionParametersRequest"/>

### ValidateSuggestionParametersRequest



| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| study_config | [StudyConfig](#api.StudyConfig) | | |
| suggestion_algorithm | [string](#string) | | |
| suggestion_parameters | [SuggestionParameter](#api.SuggestionParameter) | repeated | |






<a name="api.Worker"/>

### Worker
Expand Down Expand Up @@ -1474,6 +1503,7 @@ https://cloud.google.com/service-infrastructure/docs/service-management/referenc
| ReportMetricsLogs | [ReportMetricsLogsRequest](#api.ReportMetricsLogsRequest) | [ReportMetricsLogsReply](#api.ReportMetricsLogsRequest) | Report a logs of metrics for workers. The logs for each worker must have timestamp and must be ordered in time series. When the log you reported are already reported before, it will be dismissed and get no error. |
| GetSavedStudies | [GetSavedStudiesRequest](#api.GetSavedStudiesRequest) | [GetSavedStudiesReply](#api.GetSavedStudiesRequest) | |
| GetSavedModels | [GetSavedModelsRequest](#api.GetSavedModelsRequest) | [GetSavedModelsReply](#api.GetSavedModelsRequest) | |
| ValidateSuggestionParameters | [ValidateSuggestionParametersRequest](#api.ValidateSuggestionParametersRequest) | [ValidateSuggestionParametersReply](#api.ValidateSuggestionParametersRequest) | Validate Suggestion Parameters from Study Job. |


<a name="api.Suggestion"/>
Expand All @@ -1484,6 +1514,7 @@ https://cloud.google.com/service-infrastructure/docs/service-management/referenc
| Method Name | Request Type | Response Type | Description |
| ----------- | ------------ | ------------- | ------------|
| GetSuggestions | [GetSuggestionsRequest](#api.GetSuggestionsRequest) | [GetSuggestionsReply](#api.GetSuggestionsRequest) | |
| ValidateSuggestionParameters | [ValidateSuggestionParametersRequest](#api.ValidateSuggestionParametersRequest) | [ValidateSuggestionParametersReply](#api.ValidateSuggestionParametersRequest) | |



Expand Down
66 changes: 66 additions & 0 deletions pkg/api/gen-doc/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -494,6 +494,14 @@ <h2>Table of Contents</h2>
<a href="#api.UpdateWorkerStateRequest"><span class="badge">M</span>UpdateWorkerStateRequest</a>
</li>

<li>
<a href="#api.ValidateSuggestionParametersReply"><span class="badge">M</span>ValidateSuggestionParametersReply</a>
</li>

<li>
<a href="#api.ValidateSuggestionParametersRequest"><span class="badge">M</span>ValidateSuggestionParametersRequest</a>
</li>

<li>
<a href="#api.Worker"><span class="badge">M</span>Worker</a>
</li>
Expand Down Expand Up @@ -2758,6 +2766,50 @@ <h3 id="api.UpdateWorkerStateRequest">UpdateWorkerStateRequest</h3>



<h3 id="api.ValidateSuggestionParametersReply">ValidateSuggestionParametersReply</h3>
<p>Return 1 if Suggestion Parameters is Valid, 0 if not</p>





<h3 id="api.ValidateSuggestionParametersRequest">ValidateSuggestionParametersRequest</h3>
<p></p>


<table class="field-table">
<thead>
<tr><td>Field</td><td>Type</td><td>Label</td><td>Description</td></tr>
</thead>
<tbody>

<tr>
<td>study_config</td>
<td><a href="#api.StudyConfig">StudyConfig</a></td>
<td></td>
<td><p> </p></td>
</tr>

<tr>
<td>suggestion_algorithm</td>
<td><a href="#string">string</a></td>
<td></td>
<td><p> </p></td>
</tr>

<tr>
<td>suggestion_parameters</td>
<td><a href="#api.SuggestionParameter">SuggestionParameter</a></td>
<td>repeated</td>
<td><p> </p></td>
</tr>

</tbody>
</table>




<h3 id="api.Worker">Worker</h3>
<p>A process of evaluation for a trial.</p><p>Types of worker supported by Katib are k8s Job, TF-Job, and Pytorch-Job.</p>

Expand Down Expand Up @@ -3188,6 +3240,13 @@ <h3 id="api.Manager">Manager</h3>
<td><p></p></td>
</tr>

<tr>
<td>ValidateSuggestionParameters</td>
<td><a href="#api.ValidateSuggestionParametersRequest">ValidateSuggestionParametersRequest</a></td>
<td><a href="#api.ValidateSuggestionParametersReply">ValidateSuggestionParametersReply</a></td>
<td><p>Validate Suggestion Parameters from Study Job.</p></td>
</tr>

</tbody>
</table>

Expand All @@ -3206,6 +3265,13 @@ <h3 id="api.Suggestion">Suggestion</h3>
<td><p></p></td>
</tr>

<tr>
<td>ValidateSuggestionParameters</td>
<td><a href="#api.ValidateSuggestionParametersRequest">ValidateSuggestionParametersRequest</a></td>
<td><a href="#api.ValidateSuggestionParametersReply">ValidateSuggestionParametersReply</a></td>
<td><p></p></td>
</tr>

</tbody>
</table>

Expand Down
Loading