-
Notifications
You must be signed in to change notification settings - Fork 130
Fault injection macros and functionality (plus example) #264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
153f42e
e9cc019
97dc369
a4f74ae
adad3bb
e86483a
e083a0d
203747d
9a72ee0
f5bb491
36050ca
2e99f4c
75009d9
b03d89e
6a5daad
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -131,6 +131,55 @@ extern "C" | |
| # define RCUTILS_UNLIKELY(x) (x) | ||
| #endif // _WIN32 | ||
|
|
||
| #if defined RCUTILS_ENABLE_FAULT_INJECTION | ||
| #include "rcutils/testing/fault_injection.h" | ||
|
|
||
| /** | ||
| * \def RCUTILS_CAN_RETURN_WITH_ERROR_OF | ||
| * Indicating macro that the function intends to return possible error value. | ||
| * | ||
| * Put this macro as the first line in the function. For example: | ||
| * | ||
| * int rcutils_function_that_can_fail() { | ||
| * RCUTILS_CAN_RETURN_WITH_ERROR_OF(RCUTILS_RET_INVALID_ARGUMENT); | ||
| * ... // rest of function | ||
| * } | ||
| * | ||
| * For now, this macro just simply calls `RCUTILS_MAYBE_RETURN_ERROR` if fault injection is | ||
| * enabled. However, for source code, the macro annotation `RCUTILS_CAN_RETURN_WITH_ERROR_OF` helps | ||
| * clarify that a function may return a value signifying an error and what those are. | ||
| * | ||
| * In general, you should only include a return value that originates in the function you're | ||
| * annotating instead of one that is merely passed on from a called function already annotated with | ||
| *`RCUTILS_CAN_RETURN_WITH_ERROR_OF`. If you are passing on return values from a called function, | ||
| * but that function is not annotated with `RCUTILS_CAN_RETURN_WITH_ERROR_OF`, then you might | ||
| * consider annotating that function first. If for some reason that is not desired or possible, | ||
| * then annotate your function as if the return values you are passing on originated from your | ||
| * function. | ||
| * | ||
| * If the function can return multiple return values indicating separate failure types, each one | ||
| * should go on a separate line. | ||
| * | ||
| * If in your function, there are expected effects on output parameters that occur during | ||
| * the failure case, then it will introduce a discrepency between fault injection testing and | ||
| * production operation. This is because the fault injection will cause the function to return | ||
| * where this macro is used, not at the location the error values are typically returned. To help | ||
| * protect against this scenario you may consider adding unit tests that checks your function does | ||
|
brawner marked this conversation as resolved.
Outdated
|
||
| * not modify output parameters when it actually returns a failing error code. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @brawner I think this the a strong (the strongest?) constrain. This will cover 90% of our use cases, but, in the most general case, not all errors occur during early checks and not all errors can (should?) have zero observable side-effects. Could the macro take some statements to mimic such side effects if any?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that may be a reasonable extension to add. The difficulty though is that there may be multiple instances of returning the same error code in a function (for example While many cases might have side effects after returning through an error case, my limited experience with this code base suggests those typically result in an undefined state more than an expected partially completed state. I think for this case, we'll just have to build up a list of examples and what the requirements for those functions would be. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair enough. I'd like to point out though that while:
often holds true, IMHO it's bad practice. Really bad practice. If your program is left in an undefined state after returning an error, you may as well terminate it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Meta: that's not always true or possible, finalization functions being a good counterexample. I don't think the limitation should be addressed here though (or ever if mocks help us cover those cases). |
||
| * | ||
| * If your function is void, this macro can be used without parameters. However, for the above | ||
| * reasoning, there should be no side effects on output parameters for all possible early returns. | ||
| * | ||
| * \param error_return_value the value returned as a result of an error. It does not need to be | ||
| * a rcutils_ret_t type. It could also be NULL, -1, a string error message, etc | ||
| */ | ||
| # define RCUTILS_CAN_RETURN_WITH_ERROR_OF(error_return_value) \ | ||
| RCUTILS_MAYBE_RETURN_ERROR(error_return_value); | ||
|
|
||
| #else | ||
| # define RCUTILS_CAN_RETURN_WITH_ERROR_OF(error_return_value) | ||
| #endif // defined RCUTILS_ENABLE_FAULT_INJECTION | ||
|
|
||
| #ifdef __cplusplus | ||
| } | ||
| #endif | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
| // Copyright 2020 Open Source Robotics Foundation, Inc. | ||
| // | ||
| // Licensed under the Apache License, Version 2.0 (the "License"); | ||
| // you may not use this file except in compliance with the License. | ||
| // You may obtain a copy of the License at | ||
| // | ||
| // http://www.apache.org/licenses/LICENSE-2.0 | ||
| // | ||
| // Unless required by applicable law or agreed to in writing, software | ||
| // distributed under the License is distributed on an "AS IS" BASIS, | ||
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| // See the License for the specific language governing permissions and | ||
| // limitations under the License. | ||
|
|
||
| #ifndef RCUTILS__TESTING_MACROS_H_ | ||
| #define RCUTILS__TESTING_MACROS_H_ | ||
| #include <stdio.h> | ||
|
|
||
| #ifdef __cplusplus | ||
| extern "C" | ||
| { | ||
| #endif | ||
|
|
||
| void _rcutils_set_fault_injection_count(int count); | ||
|
|
||
| int _rcutils_maybe_fail(); | ||
|
|
||
| #if defined RCUTILS_ENABLE_FAULT_INJECTION | ||
|
|
||
| /** | ||
| * \def RCUTILS_MAYBE_RETURN_ERROR | ||
| * \brief This macro checks and decrements a static global variable atomic counter and returns | ||
| * `return_value_on_error` if 0. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @brawner one thing this scheme cannot do is force an error during cleanup after one or more errors have been forced. What about if, in addition to forcing a single function to fail, we make all following functions fail as well? Like a flag to prevent the count to ever be decremented below 0. I envision the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are some examples where a failure in one location is likely to result in a failure in another location. For example, allocations may continue to fail, or file system functions may continue to fail. For the sake of coverage, there are locations where it would take two failures to reach some lines of code. Your example of cleanup is a good one. I think there are a few of potential solutions:
RCUTILS_SET_FAULT_COUNT(0, 0)For example, would trip a second failure immediately after its first failure. This would let you test more types of failures, and then we could use this type of macro to check that code is not only one-fault tolerant, but two-fault tolerant, three-fault tolerant etc.
RCUTILS_SET_FAULT_INJECTION_NAMED(...);
RCUTILS_SET_FAULT_INJECTION_NAMED(...);
... // Run test codeIf we do introduce this one at a later date, I could see it replacing the unnamed version macro entirely. This idea requires multiple unit tests for each targeted failed injection, but is also the only option that allows for targeted injections in the first place. Do you have any thoughts about the other options suitability for the situation you are describing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think solution (2.) would be best. We don't really know what we're hitting, so N-fault tolerant sounds as good as it gets.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can be added in a followup PR when the functionality becomes necessary for better coverage. |
||
| * | ||
| * This macro is not a function itself, so when this macro returns it will cause the calling | ||
| * function to return with the return value. | ||
| * | ||
| * Set the counter with `RCUTILS_SET_FAULT_INJECTION_COUNT`. If the count is less than 0, then | ||
| * `RCUTILS_MAYBE_RETURN_ERROR` will not cause an early return. | ||
| * | ||
| * This macro is thread-safe, and ensures that at most one invocation results in a failure for each | ||
| * time the fault injection counter is set with `RCUTILS_SET_FAULT_INJECTION_COUNT` | ||
| * | ||
| * \param return_value_on_error the value to return in the case of fault injected failure. | ||
| */ | ||
| #define RCUTILS_MAYBE_RETURN_ERROR(return_value_on_error) \ | ||
| if (0 == _rcutils_maybe_fail()) { \ | ||
| return return_value_on_error; \ | ||
| } | ||
|
|
||
| /** | ||
| * \def RCUTILS_SET_FAULT_INJECTION_COUNT | ||
| * \brief Atomically set the fault injection counter. | ||
| * | ||
| * There will be at most one fault injected failure per call to RCUTILS_SET_FAULT_INJECTION_COUNT. | ||
| * To test all reachable fault injection locations, call this macro inside a for loop with | ||
| * sufficient iterations setting count to the loop iteration variable. For example: | ||
| * | ||
| * for (int i = 0; i < SUFFICIENTLY_LARGE_ITERATION_COUNT; ++i) { | ||
| * RCUTILS_SET_FAULT_INJECTION_COUNT(i); | ||
| * ... // Call function under test | ||
| * } | ||
| * | ||
| * Where SUFFICIENTLY_LARGE_ITERATION_COUNT is a value larger than the maximum expected calls to | ||
| * `RCUTILS_MAYBE_RETURN_ERROR`. In your fault injection unit test, it is recommended to run one | ||
| * last iteration with the fault injection counter set to this maximum value and validate that the | ||
| * results of the call to the function under test would result in the same thing as if no fault | ||
| * injection was used. This will help ensure that this maximum value is suitable and will call | ||
| * attention to maintainers if it needs to be increased because more instances of | ||
| * RCUTILS_MAYBE_RETURN_ERROR were introduced. | ||
|
brawner marked this conversation as resolved.
Outdated
|
||
| * | ||
| * \param count The count to set the fault injection counter to. If count is negative, then fault | ||
| * injection errors will be disabled. The counter is globally initialized to -1. | ||
| */ | ||
| #define RCUTILS_SET_FAULT_INJECTION_COUNT(count) \ | ||
| _rcutils_set_fault_injection_count(count); | ||
|
|
||
| #else // RCUTILS_ENABLE_FAULT_INJECTION | ||
|
|
||
| #define RCUTILS_SET_FAULT_INJECTION_COUNT(count) | ||
|
|
||
| #define RCUTILS_MAYBE_RETURN_ERROR(msg, error_statement) | ||
|
|
||
| #endif // defined RCUTILS_ENABLE_FAULT_INJECTION | ||
|
|
||
| #ifdef __cplusplus | ||
| } | ||
| #endif | ||
|
|
||
| #endif // RCUTILS__TESTING_MACROS_H_ | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -39,6 +39,8 @@ rcutils_snprintf(char * buffer, size_t buffer_size, const char * format, ...) | |
| int | ||
| rcutils_vsnprintf(char * buffer, size_t buffer_size, const char * format, va_list args) | ||
| { | ||
| RCUTILS_CAN_RETURN_WITH_ERROR_OF(-1); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The main concern here is that we are going to litter our code with these macros. It's not the end of the world, but it is noisy and can make the code harder to read. I have 2 separate suggestions:
The benefit to both solutions above is that there is no changes to the code, just changes to the tests. But I'd also like to hear what others think.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I pushed this PR last night, and intended to give a more full writeup. Please take a look at the above PR description for some more information as well as possible extensions. The approach taken in this PR allows for immediate cross-platform, cross-language utilization with simple unit tests with no external dependencies. You can write a unit test that easily verifies your code is hardened against failures in upstream libraries. See ros2/rcl_logging#48 for an example. Another approach was to add an I took care in the design of This PR is intended as a starting point, and I want to make it as useful as possible. There are several possible feature additions that I think could fill it out, but I would like to get some more buy in as well.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Apologies for using potentially harsh language above. That wasn't my intention. As this stands, I think you've done a good job of minimizing the impact of these macros on the "live" code (that is, the real code we are trying to test and that we ship). That is, I don't see how this method of mocking could be any simpler than adding a single macro line to any function/method that you want to mock out. As we've discussed elsewhere, this also sets the stage for having the macro enabled for our nightly and CI builds, but disabled for our packaging and debian builds. My biggest concern is deploying this kind of change across a large number of functions and methods throughout the ROS 2 codebase. Adding things like this to every function places additional mental burden on contributors, as all readers have to understand why they are here and what they do. It also makes it somewhat difficult to move away from this solution in the future if we find a better way. I want to be clear that I am not totally against this change. However, I think some of the other options we are pursuing don't have the downsides of this solution, so I'd prefer to investigate those other options first. If those solutions turn out not to work for one reason or another, this is an acceptable path forward. Does that clear up my thinking on this?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you, it does help clarify your thinking. My hope is that this sort of feature is used in conjunction with mocking like what @hidmic is investigating. That sort of testing would allow for more targeted failures, while this approach especially with the for loop example might be more like throwing a grenade at your code and seeing if it survives. You're absolutely right that there are a large number of places where the My suggestion with this sort of change is to start small and expand it as its role in this codebase is better understood. That's partially the reason why I only introduce one flavor of That macro is empty if fault injection is not enabled, which I think helps make it easier to remove in the future. The example unit test in rcl_logging_spdlog would pass if I also agree that seeing this macro at the top of a function would be strange and unusual to many developers, particularly newer ROS 2 users who are unfamiliar with the code. While I tried to be careful about the macro name, I'm also very open to rewording There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@clalancette I think this serves a different use case. Mocking is good for white-box unit testing, while this is performing black-box integration testing to see how the system, with all its side effects, behaves in an exceptional event (akin to exception safety verification). Could we use mocks to implement it? Yes, but we'd end up coupling a package to its dependencies' implementation and recreating side effects -- possible, but much harder. IMHO, |
||
|
|
||
| if (NULL == format) { | ||
| errno = EINVAL; | ||
|
brawner marked this conversation as resolved.
|
||
| return -1; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| // Copyright 2020 Open Source Robotics Foundation, Inc. | ||
| // | ||
| // Licensed under the Apache License, Version 2.0 (the "License"); | ||
| // you may not use this file except in compliance with the License. | ||
| // You may obtain a copy of the License at | ||
| // | ||
| // http://www.apache.org/licenses/LICENSE-2.0 | ||
| // | ||
| // Unless required by applicable law or agreed to in writing, software | ||
| // distributed under the License is distributed on an "AS IS" BASIS, | ||
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| // See the License for the specific language governing permissions and | ||
| // limitations under the License. | ||
|
|
||
| #include "rcutils/testing/fault_injection.h" | ||
|
|
||
| #include "rcutils/stdatomic_helper.h" | ||
|
|
||
| static atomic_int_least64_t g_rcutils_fault_injection_count = -1; | ||
|
|
||
| int _rcutils_maybe_fail() | ||
| { | ||
| bool set_atomic_success = false; | ||
| int_least64_t current_count = -1; | ||
| rcutils_atomic_load(&g_rcutils_fault_injection_count, current_count); | ||
| do { | ||
| // A fault_injection_count less than 0 means that maybe_fail doesn't fail, so just return. | ||
| if (current_count < 0) { | ||
| return current_count; | ||
| } | ||
|
|
||
| // Otherwise decrement by one, but do so in a thread-safe manner so that exactly one calling | ||
| // thread gets the 0 case. | ||
| int_least64_t desired_count = current_count - 1; | ||
| rcutils_atomic_compare_exchange_strong( | ||
| &g_rcutils_fault_injection_count, set_atomic_success, ¤t_count, desired_count); | ||
| } while (!set_atomic_success); | ||
| return current_count; | ||
| } | ||
|
|
||
| void _rcutils_set_fault_injection_count(int count) | ||
| { | ||
| rcutils_atomic_store(&g_rcutils_fault_injection_count, count); | ||
| } |
Uh oh!
There was an error while loading. Please reload this page.