Adaptive Load FakeStepController doom update#492
Adaptive Load FakeStepController doom update#492dubious90 merged 45 commits intoenvoyproxy:masterfrom
Conversation
update from master
merge from upstream
merge from upstream
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
…double Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
…variable_setter Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
merge from upstream
…ent.Output turns out not to include the status Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
…plugin names, log thresholds only once per session Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
dubious90
left a comment
There was a problem hiding this comment.
Mostly one big question that I want more clarity on.
| nighthawk::client::CommandLineOptions command_line_options_template) | ||
| : is_converged_{false}, is_doomed_{false}, fixed_rps_value_{config.fixed_rps_value()}, | ||
| : input_setting_failure_countdown_{config.artificial_input_setting_failure_countdown()}, | ||
| config_{std::move(config)}, is_converged_{false}, is_doomed_{false}, |
There was a problem hiding this comment.
not a complaint, but what is std::move accomplishing here, given that this isn't a smart pointer?
There was a problem hiding this comment.
Funny story, this was actually required by clang-tidy. If the function implementation is making a copy of a const reference parameter, clang-tidy tells you to change the parameter to be pass by value so it's obvious to the caller that a copy is happening. The argument gets copied at the call site into an unnamed temporary value. Then in the constructor we can actually move this temporary value into the field without incurring the cost of a second copy.
| int32 fixed_rps_value = 1; | ||
| // Artificial error that the plugin factory should return during validation. Optional. | ||
| google.rpc.Status artificial_validation_failure = 2; | ||
| // Artificial error that should be returned from GetCurrentCommandLineOptions(). Optional. |
There was a problem hiding this comment.
Can we enhance this comment to indicate the relation to the countdown?
| // Artificial error that the plugin factory should return during validation. Optional. | ||
| google.rpc.Status artificial_validation_failure = 2; | ||
| // Artificial error that should be returned from GetCurrentCommandLineOptions(). Optional. | ||
| google.rpc.Status artificial_input_setting_failure = 3; |
There was a problem hiding this comment.
I think I'm having trouble understanding the full use case here. If this is 3, then we are going to succeed 3 times, then fail on the 4th attempt. Makes sense, but I'm not sure I understand why that's useful. What is the test that you're supporting by creating it in this way?
There was a problem hiding this comment.
GetCurrentCommandLine() can return an error status that should be handled cleanly by the main controller loop. The controller calls GetCurrentCommandLine() repeatedly during the adjusting stage until the step controller says it's converged, and then in the testing stage we call GetCurrentCommandLine() one last time, reusing the last converged value.
In order to test handling of these errors, we need the FakeStepController to return successfully from GetCurrentCommandLine() during the adjusting stage, but then start returning errors just in time for the testing stage. We don't have any way to update the FakeStepController during the run, so it has to somehow be programmed up front to behave differently at different times.
An alternative would be for magic values in UpdateAndRecompute() to trigger GetCurrentCommandLine() error behavior. We already use magic values to control convergence and doom. But there's only so much information we can encode in metric score doubles without having it get out of hand. Currently the UpdateAndRecompute() behavior is: zero scores=non-converged non-doomed, any positive score=converged, any negative score=doomed.
This trick wouldn't be necessary if the step controller were aware of what stage it was operating in.
|
@mum4k Can you give this a review and assign back to me when done? |
|
@eric846 please address / respond to the comments from @dubious90 as that can help me by providing additional context for the review. |
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
…date Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
oschaaf
left a comment
There was a problem hiding this comment.
Just a const nit, LGTM otherwise. Left a question that popped up over at #466 (comment) as that's out of scope here.
test/adaptive_load/fake_plugins/fake_step_controller/fake_step_controller_test.cc
Outdated
Show resolved
Hide resolved
mum4k
left a comment
There was a problem hiding this comment.
Looks good with just a few nits.
| * | ||
| * @param fixed_rps_value Value for RPS to set in the FakeStepControllerConfig proto until the | ||
| * countdown reaches zero. | ||
| * @param artificial_input_setting_failure An error status. |
There was a problem hiding this comment.
Can we expand this comment, explaining what is the meaning of the error status, i.e. what it is used for?
| @@ -22,7 +22,7 @@ class FakeStepController : public StepController { | |||
| * @param config FakeStepControllerConfig proto for setting the fixed RPS value. | |||
There was a problem hiding this comment.
(pre-existing, optional) We should probably mention in the class comment that this class isn't thread-safe. At least I am assuming it isn't meant to be.
Signed-off-by: eric846 <56563761+eric846@users.noreply.github.com>
oschaaf
left a comment
There was a problem hiding this comment.
Thanks for resolving the const nit. LGTM for my part of the review.
Updates FakeStepController:
FakeStepController::UpdateAndRecompute()receives aBenchmarkResultwith a negative metric score. PreviouslyBenchmarkResultcould contain an error status, and we used the error status to trigger simulated doom. Now errors do not get passed toStepController::UpdateAndRecompute(), so we need a new mechanism.InputVariableSetterfailure inGetCurrentCommandLineOptions(). The artificial failure is specified in the config proto.GetCurrentCommandLineOptions()returns successfully untilUpdateAndRecompute()has been called a configured number of times, then starts returning the simulated input variable setter failure mentioned above. This simple autonomous behavior is needed to test error handling in the testing stage without triggering an early exit in the adjusting stage; the adaptive load controller can't be paused in the middle of the test to reconfigure theFakeStepControllerto change its behavior.The behavior of
FakeStepControllerafter anUpdateAndRecompute()with aBenchmarkResultcontaining the given metric scores:Part 3 of splitting PR #483.