Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stub for CFE_SB_SendMsg does not always save MsgPtr argument value #937

Closed
asgibson opened this issue Oct 1, 2020 · 16 comments
Closed

Stub for CFE_SB_SendMsg does not always save MsgPtr argument value #937

asgibson opened this issue Oct 1, 2020 · 16 comments

Comments

@asgibson
Copy link
Contributor

asgibson commented Oct 1, 2020

Describe the bug
If a negative value is set for the return from CFE_SB_SendMsg stub the MsgPtr argument is not saved to the context. The passed in message cannot be verified for forced negative results.

To Reproduce
Steps to reproduce the behavior:

  1. Write a test giving CFE_SB_SendMsg a context and that forces CFE_SB_SendMsg to return a negative value
  2. Check that the context contains the expected pointer - FAIL
  3. Change the return value to >=0
  4. Check that the context contains the expected pointer - PASS

Expected behavior
No matter the forced return value the MsgPtr should be copied to the context. The check for status >= 0 appears to be leftover from the previous use of:

    if (status >= 0)
    {
        UT_Stub_CopyFromLocal(UT_KEY(CFE_SB_SendMsg), MsgPtr->Byte,
                CFE_SB_StubMsg_GetMetaData(MsgPtr)->TotalLength);
    }

Code snips

int32 CFE_SB_SendMsg(CFE_SB_Msg_t *MsgPtr)
{
UT_Stub_RegisterContext(UT_KEY(CFE_SB_SendMsg), MsgPtr);
int32 status = CFE_SUCCESS;
/*
* Create a context entry so a hook function
* could do something useful with the message
*/
status = UT_DEFAULT_IMPL(CFE_SB_SendMsg);
if (status >= 0)
{
UT_Stub_CopyFromLocal(UT_KEY(CFE_SB_SendMsg), &MsgPtr, sizeof(MsgPtr));
}
return status;
}

System observed on:
RHEL 7.6

Reporter Info
Alan Gibson NASA GSFC/587

@asgibson asgibson changed the title Stub for CFE_SB_SendMsg should always save MsgPtr argument value Stub for CFE_SB_SendMsg does not always save MsgPtr argument value Oct 2, 2020
@skliper skliper removed the bug label Nov 18, 2020
@skliper
Copy link
Contributor

skliper commented Nov 18, 2020

Not clear in a coverage test the value of checking a the pointer value when error testing if you've already checked it for the success case. With just changing the error code returned from CFE_SB_SendMsg, I'm not seeing a huge benefit to checking the pointer again. If it's a unique requirement for your test, could you implement a hook to meet your needs? Removed the bug label.

@asgibson
Copy link
Contributor Author

My tests typically do not use the same pointer for success and fail conditions which is why I want to check it. I also check fail conditions first, not success. I don't see the need to create a hook for this operation when the stub can easily do it.

Why does it matter if we put an argument into the context on every call? What are we saving by only doing it on success? It appears we are sacrificing granularity for no reason.

@jphickey
Copy link
Contributor

My tests typically do not use the same pointer for success and fail conditions

I'm confused at this statement, because in this API the pointer is supplied by the application under test. It sends a message that it locally generated. Maybe if you had a specific test case you could share, it would be clearer?

With the current logic, the buffer associated with CFE_SB_SendMsg will end up containing pointers to the messages that were actually sent (or at least the unit under test thinks were sent, i.e. for which the stub returned success).

With the proposed logic of removing the conditional, the buffer will contain all messages it attempted to send, whether they were successfully sent or not, and one wouldn't be able to tell the difference if there was more than one in the buffer.

The established pattern of stubs is typically that they don't transfer/process/store input or output when configured to fail. In cases of an output it is often set to a fixed value (e.g. 0xDEADBEEF) such that if the app actually uses it, it will be caught.

On the input side.... I do think the current approach aligns with the way other stubs work. For instance consider OS_write ... this stub only stores to the UT buffer what it (pretends) to actually write, not what the unit attempted to write. The point is that an app may (and should?) resume the write in a second invocation and complete the data output when OS_write doesn't output the full amount in one call, and a test case could verify the total output still worked correctly.

The same should logically be true of CFE_SB_SendMsg() ... if this returns an error, the unit/app may actually handle the error and send something else. So it makes sense to only store buffers which are purported to be "successfully sent", not the ones that the unit attempted to send but were rejected.

Why does it matter if we put an argument into the context on every call? What are we saving by only doing it on success? It appears we are sacrificing granularity for no reason.

This is for the reasons above - the stub call may get invoked multiple times with a mixture of return cases (deferred retcode, forced retcode, etc). If everything gets stored unconditionally then it is not possible to tell which ones were success cases and which ones were failure cases.

All that being said, I don't have too strong of feelings about it one way or the other, because this is a message-oriented call whereas something like OS_write is a stream oriented call. So the message(s) in the buffer still should be identifiable either way - assuming the unit under test sets the header correctly and all (which might be too big of an assumption).

But I'm also not sure of the resistance to writing a hook function for this. The design goal is that hook functions should be simple and easy to write if you don't like the behavior of the stub for your specific test case. That's the whole point, different test scenarios might not be ideal for the single unified stub behavior, so UT assert has an easy way to customize it, to avoid thrashing around on the common stubs and potentially breaking other tests.

@jphickey
Copy link
Contributor

If I'm understanding correctly you should be able to accomplish what you are looking for with a simple hook:

static int32 UT_SendMsgFail_Hook(void *UserObj, int32 StubRetcode, uint32 CallCount, const UT_StubContext_t *Context,
                                va_list va)
{
    CFE_SB_Msg_t* MsgPtr;

    if (StubRetcode != CFE_SUCCESS)
    {
        MsgPtr = UT_Hook_GetArgValueByName(Context, "MsgPtr", CFE_SB_Msg_t*);
        UT_Stub_CopyFromLocal(UT_KEY(UT_SendMsgFail_Hook), &MsgPtr, sizeof(MsgPtr));
    }
    
    return StubRetcode;
}

This will store "failed" messages to an alternate data buffer associated with UT_KEY(UT_SendMsgFail_Hook) whereas all "successfully" sent messages will get stored in the data buffer associated with UT_KEY(CFE_SB_SendMsg) as they currently are.

(Of course one must consider that these are storing the pointers, not the actual message content, which might go out of scope after the test case ends, so there's that issue too)

@asgibson
Copy link
Contributor Author

This just revolves back to our fundamental disagreement with what a unit test is and how it should work. All stubs for unit testing should just report what they receive and output what they are told, in my view. Additional functionality should be done by hooks (and yes I consider determining the saving of arguments to context as included in that). Unit tests should be independent of each other and with resets in between, so there would be no multiple messages in the context in that case, unless it was called multiple times within a single function under test (whereupon I would make sure the context that I set as the data buffer could handle that).

I'm confused at this statement, because in this API the pointer is supplied by the application under test. It sends a message that it locally generated. Maybe if you had a specific test case you could share, it would be clearer?

My statement may not have been the best example (although if I dig further I sure I could find the one that led me to this, but it has been 6 weeks and many written tests ago), but consider this function:

static void CF_HkCmd(void)
{
    CFE_SB_TimeStampMsg((CFE_SB_Msg_t*)&CF_AppData.hk);   
    /* return value ignored */ CFE_SB_SendMsg((CFE_SB_Msg_t*)&CF_AppData.hk);
}

Since this situation is a "don't care" on the return value, I should be able to determine the pointer received by CFE_SB_SendMsg regardless of whether or not the function returned success. My unit test for this function should be the same no matter the return value of CFE_SB_SendMsg -- success or fail.

One may then say, just set it to CFE_SUCCESS and be done, but that does not test the behavior of the code which "does not care" about the return value. Therefore, I should be able to return any value and have it pass and be able to ascertain the pointer received by CFE_SB_SendMsg. I should even be able to use a random return value and always get the same result because that is the behavior of this particular code under test.

@jphickey
Copy link
Contributor

All stubs for unit testing should just report what they receive and output what they are told

While I see your point here (I think) its one of those cases where the reality can sometimes get in the way, WRT side effects that the function is supposed to have. The examples I often cite are things like memset and memcpy .... if stubs for these functions strictly reported their inputs and returned a value while doing nothing else, it would cause major failures - because later code probably relies on them actually having set or copy memory, respectively. This is why many stubs must perform some sort of nonzero action.

While I'm not entirely against simplifying the stub here, I just still can't see how allowing one to retroactively check the message pointer for cases where the message was not sent is helping to make better test cases - because it seems to be testing things that wouldn't even matter in real FSW (i.e. if the message wasn't sent, nobody should care what it's content was, it wasn't sent). You've also got the scoping issue to contend with (mentioned earlier) where the message can be a local stack object that is gone by the time the unit returns. In this light, even the current stub code is broken for this type of call.

It seems in your example one can also simply check that CFE_SB_SendMsg was invoked, using UT_GetStubCount(UT_KEY(CFE_SB_SendMsg)).

If your test really cares about message content, its best to use a hook, as this avoids the scope problem too (its still in scope when the hook runs).

@asgibson
Copy link
Contributor Author

My test does not care about the message content because the function being tested does not care. The unit test is only verifying the function's behavior. In retrospect, I do believe the code mentioned above was what caused me to put in this ticket, here is my test:

void Test_CF_HkCmd_TimestampsAndSendsMessageWith_CF_AppData_hk(void)
{
    /* Arrange */
    CFE_SB_Msg_t*   context_CFE_SB_TimeStampMsg;
    CFE_SB_Msg_t*   context_CFE_SB_SendMsg;
    
    UT_SetDataBuffer(UT_KEY(CFE_SB_TimeStampMsg), &context_CFE_SB_TimeStampMsg, 
      sizeof(context_CFE_SB_TimeStampMsg), false);
    //TODO: Should be able to be any result not just CFE_SUCCESS
    UT_SetForceFail(UT_KEY(CFE_SB_SendMsg), CFE_SUCCESS); 
    UT_SetDataBuffer(UT_KEY(CFE_SB_SendMsg), &context_CFE_SB_SendMsg, 
      sizeof(context_CFE_SB_SendMsg), false);
      
    /* Act */
    CF_HkCmd();
    
    /* Assert */
    UtAssert_True(context_CFE_SB_TimeStampMsg == &CF_AppData.hk,
      "CFE_SB_TimeStampMsg received %p and should be %p (&CF_AppData.hk)", 
      context_CFE_SB_TimeStampMsg, &CF_AppData.hk);
    UtAssert_True(context_CFE_SB_SendMsg == &CF_AppData.hk,
      "CFE_SB_SendMsg received %p and should be %p (&CF_AppData.hk)", 
      context_CFE_SB_SendMsg, &CF_AppData.hk);
} /* end Test_CF_HkCmd_TimestampsAndSendsMessageWith_CF_AppData_hk */

@jphickey
Copy link
Contributor

Right - and I think this is where we disagree about what should be "asserted" upon in a test. Specifically from above:

    /* Assert */
    UtAssert_True(context_CFE_SB_TimeStampMsg == &CF_AppData.hk,
      "CFE_SB_TimeStampMsg received %p and should be %p (&CF_AppData.hk)", 
      context_CFE_SB_TimeStampMsg, &CF_AppData.hk);
    UtAssert_True(context_CFE_SB_SendMsg == &CF_AppData.hk,
      "CFE_SB_SendMsg received %p and should be %p (&CF_AppData.hk)", 
      context_CFE_SB_SendMsg, &CF_AppData.hk);

I would contend that there is no requirement that the application call the send message function with a specific buffer. It just needs to call the function with a valid buffer. This test case seems to be just checking for a specific implementation. Another implementation could create an HK message buffer on the stack, fill it, and send it - and that's perfectly valid too.

If the goal is to check that CFE_SB_SendMsg was invoked, but not care about the message content, I recommend using UT_GetStubCount. (this is all a coverage-oriented test should really do).

If the goal is to check that the message buffer actually had valid telemetry inside it, then write a hook to validate the message content.

But validating that the message pointer matches the address of global just proves that the compiler worked (i.e. same address ref code in two places is the same).

@skliper
Copy link
Contributor

skliper commented Nov 19, 2020

This just revolves back to our fundamental disagreement with what a unit test is and how it should work.

Agreed. The stubs were designed to support coverage tests. Any "functional" testing you do beyond that may not be supported by the default stubs, and in that case the recommendation is to write a custom hook.

@asgibson
Copy link
Contributor Author

But validating that the message pointer matches the address of global just proves that the compiler worked (i.e. same address ref code in two places is the same).

I do understand what you are saying here, but will there be a time these global addresses are different that is a valid situation? I have only seen consistent behavior in these types of tests. If they are not valid, then I would reconsider the implementation.

I know we differ on our testing views. This is not a functional test and it is not a coverage-oriented test. It is a behavioral test. It shows the behavior of the code (admittedly I usually also have the stub counts here too, but I think I punted at the time expecting to return because of the TODO and this ticket). If all you are looking for is coverage, why even assert anything? I can cover all the code and verify nothing. Literally, comment out all the asserts and it causes no changes to the coverage report

I am verifying the behavior of the code (what it says that it does), whether the function is valid for the system (does what we need it to do) is not the goal of a unit test. In this case, that the CF_HkCmd calls two functions with the expected value. A functional test is concerned with "When I asked the system for the hk did I get it?" This unit test is concerned with "When the CF_HkCmd is called does it Timestamp and Send the hk message?" If it only checked that Timestamp and SendMsg were called, how do we know it sent the intended message? I don't want to wait until a functional test tells me that the message is wrong then have to track down why because that could be anywhere in the chain of function calls.

Unit tests find problems early and tackles them during development, which is why I highly recommend Test Driven Design. Functional tests are not meant to find errors at the code level, they should find design issues. If code is well unit tested the majority of problems at the functional test level are design issues, not coding issues.

But as has happened in the past with other tickets, we have veered way off topic. I've said enough, if there is to be no change to the stub, I will sally forth with using my own hooks when necessary.

@skliper
Copy link
Contributor

skliper commented Nov 23, 2020

I'm not sure what you mean by "behaviorial test". If we use OSAL as an example, we have functional tests (built with the full stack such that the external effects can be verified) at osal/src/tests and we have coverage tests at osal/src/unit-test-coverage (built with stubs). We write these tests when we implement/update the code, so there's no need to "wait" for testing. We also have build/requirements verification verification testing along with integration/system validation testing which does happen later, but it's not what I mean by functional testing. We also have a framework for functional testing within cFE (to cover PSP and cFE, and supports apps if they want to use it), but the majority of tests are still being converted from a previous framework. Ideally we'll transition to running all of these tests regularly on multiple platforms such that there's no need to wait for anything, but for now at least we have the coverage and functional tests in CI.

While I agree asserts are not strictly required for just coverage, our goal is "high-value" asserts where if they fail they are more likely to indicate a real issue with the code. Checking what address is passed into CFE_SB_SendMsg when it's configured to return an error has not in the past met this measure. This is partially driven by a desire to minimize required coverage test updates based on implementation changes. We aren't required to fail the test if the implementation changes to use the zero copy SB interface, or a locally defined message. Obviously a judgement call as to what we do, and how much is supported from a generic stub. We've picked a conceptual, admittedly somewhat fuzzy line that is open to improvements/adjustments based on contributions. It's not that we can't change the stub given open source agreement (I wouldn't oppose a PR that implemented your requested change, but I can't speak for the rest of the CCB), it's just that we don't have sufficient resources to implement every request using the core development team.

@asgibson
Copy link
Contributor Author

The goal of a behavioral test is to verify that the code will work as intended. A behavioral test should do the following things:

  1. Test one and only one behavior of the code under test (CUT)
  2. Have a title that is indicative of the expected/desired behavior
  3. Isolate the CUT as much as is possible from other code with complete isolation being the ideal
  4. (Arrange) Have a setup that indicates the necessary states that will cause the behavior
  5. (Act) Call the CUT
  6. (Assert) Verify the necessary values/results that prove the behavior occurred during the call
  7. Be completely independent of any other tests

@jphickey
Copy link
Contributor

The part that concerns me is when tests are (seemingly) just confirming one specific implementation - i.e. basically writing source code that confirms that the implementation was written in one particular way.

There is certainly some value in making sure all the if/else branches have been executed to confirm that they don't contain fundamentally broken code (uninitialized pointers, divide by zero, that type of stuff). And to achieve this certainly requires some knowledge as to what the implementation is testing in its conditionals so it can set the stage for each conditional path.

But if this gets taken to the next level and the test starts to assert on every small implementation detail, it can become a maintenance issue - where the test breaks with every nuisance change even when that change is internal only and does not affect functional requirements.

While I don't have any actual evidence - just anecdotal from personal experience - CFE+OSAL are already at the point where getting coverage tests working can easily take 3-4x the amount of time as getting the "real" (fsw) part of a change implemented, because they are so sensitive to one specific implementation. A small implementation change can become a giant coverage test headache pretty easily.

There must be a balance between catching every small implementation change and catching things that are likely to be real errors.

@asgibson
Copy link
Contributor Author

If those time frames are accurate, they are not in line with the extra time I expect unit testing to take. I have thoughts on what is causing that, but none of them have to do with the fact that unit tests are being written. Instead, I would focus on the manner of their creation to fix those issues. I would like to delve into that more robustly, but in a more proper forum than a single issue ticket.

I really like ut-assert. I used to think c code would be extremely difficult to unit test. It has shown me how wrong I was. I can get through a lot of situations with what ut-assert provides. When I am humming along testing code and I run into something where the expected behavior of a stub is not what I think it is, often I think that I have done something wrong and I start there. Which invariably leads up with the kind of hair pulling frustration you get when you know something is wrong, but cannot find it! Then, eventually, I question ut-assert and find the root cause there. It's just frustrating. Then I question ut-assert more and the problem goes the other way, my code gets scrutinized last.

Truly, inconsistency is causing the issue here (as pointed out in #1026). When I used CFE_SB_SendMsg and attempted to see the pointer it received I went bonkers trying to find what I did wrong. This was because I had used it before and it had worked (when a success return value was forced/defaulted); only to find eventually that my code wasn't the problem, the production code wasn't the problem, but I had unexpected inconsistent behavior from ut-assert.

Let's work on fixing inconsistency. Even if it ends up "all stubs don't save argument values when in an error return state" (not my personal choice), at least I'll know what to expect.

@jphickey
Copy link
Contributor

There are several reasons why the unit tests have become a major time sink, including but not limited to the fact that there is a tendency to not follow general best-practices i.e. they tend to be developed in a "just make it work" mindset, not the same design principles and organization that FSW uses. Usually an afterthought, written under time pressure to get it done. Test code is cut and paste around incessantly, and test functions/programs become severely bloated (sb_UT.c used to be well over 10000 lines long, so big that Eclipse complained when opening it that it had to shut off certain features).

On multiple occasions I've made a change to a FSW structure that affects the FSW in just a few locations, and find it works great when testing the FSW with no major issues because FSW follows proper code design and generally accesses things via defined APIs. So as long as APIs are adhered to, code doesn't break. But then I find that the thing that was so easy to update in FSW is referenced in unit tests in 50-100 different places - bypassing APIs - and find it takes hours and hours to debug and fix the unit tests.

One way to lessen the bloat of UT is to make the stubs target a "typical" use case and only require the test case to set up the difference from that typical case - but if it needs to do so in order to get its path target. In many stubs this is a return of 0 (CFE_SUCCESS) and to provide some fixed "safe" side-effect/output value, where applicable. If the test overrides the return value to be an error, then the stubs typically should not provide the side-effect/output value and/or they should set it to an unsafe value such as a null pointer - so if the application is relying on/dereferencing out the output, it will segfault get caught. (because applications typically shouldn't be relying on the outputs/side-effects of API calls that have returned an error).

I do agree that the stubs are not always consistent. Some of this is because what constitutes a "typical" use case is a subjective matter. Other reasons are historical - because there were thousands of lines of existing test cases that depended on some oddball behavior that the stub had to mimic - and we don't have the time budget to rewrite them all.

So yes, please do bring up any issues of inconsistency in the stubs - there is definitely a desire to make them follow the same pattern as much as we can, and they are far from perfect - there is a lot of history to them.

Although in theory it would be nice to have stubs be as "dumb" as possible and have every bit of their behavior dictated solely by the test cases - this is often not practical, as it means the code to set up the typical case will be cut and paste thousands of times over - so when it needs to change (and it will!) we have an even more massive UT headache to fix. So I do think each stub needs a certain "canned" default behavior - for its nominal/success mode and possibly a different one for its error mode - depending on how its typically used.

@skliper
Copy link
Contributor

skliper commented Jan 15, 2021

Sounds like we've settled on keep the stubs simple/standard, add a hook if additional functionality needed. Note CFE_SB_SendMsg was deprecated in favor of CFE_SB_TransmitMsg, but the stub is very similar (other than added parameter). Closing with that context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants