Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UtAssert method to verify expected stub argument value(s) #968

Open
jphickey opened this issue Apr 23, 2021 · 9 comments
Open

Add UtAssert method to verify expected stub argument value(s) #968

jphickey opened this issue Apr 23, 2021 · 9 comments

Comments

@jphickey
Copy link
Contributor

Is your feature request related to a problem? Please describe.
In some tests cases it is desired to not only verify that a specific stub function was called, but also to verify the specific arguments that the stub function was called with.

This is currently possible by writing a hook function to check the argument values, but that requires writing a specific hook for every API.

Describe the solution you'd like
Provide a more generic facility such that a test case can pre-assemble an object (similar to the Context object which is passed to the hook) that has the expected argument names + values. Then provide a generic method to compare those to the actual argument values when the function is called. The latter bit can be done by common code at the same time the user-defined hooks are invoked.

Describe alternatives you've considered
A scaled-back alternative might be to provide a facility to register a persistent hook that is called for every function, and is not forgotten/unregistered when the stubs are reset like other hooks are.

This would allow a test case to register a function to be called with every stub on every test without having to re-register it every time. Calling this an alternative because the test case can then implement its own generic argument value check hook, but this would simplify the process by allowing it to apply universally and persistently.

Requester Info
Joseph Hickey, Vantage Systems, Inc.

@jphickey
Copy link
Contributor Author

Ping @asgibson, this idea came from discussions yesterday.

Disclaimer is that I'm still not a fan of checking for specific stub argument values in a coverage test - because this level of detail does not translate to a functional requirement. (i.e. if a requirement is that a function writes content to a data file, a functional test should check the content created, not that it called OS_write with a specific data buffer and size - that's checking for a specific implementation, not a result).

Nonetheless, this wouldn't be too hard to implement, so if there is community desire to have this feature/capability I can implement it.

@asgibson
Copy link
Contributor

Disclaimer: I'm still not a fan of calling them coverage tests - because coverage is a byproduct of proper unit testing whereas proper unit testing is not a byproduct of coverage tests. Checking that code (at the lowest level possible) is properly written is the goal of unit testing; it is testing the behavior of the code under test and not the implementation. This can be proven by writing a test for a function that is not optimized, then optimize the code for speed/memory usage/whatever without changing the behavior and the unit tests do not falter.

This paradigm is implemented by other unit testing frameworks. I use RSpec when I write Ruby unit tests, it has the expectation design where the values expected by a method call in the CUT are set prior to the call. It does work quite well for unit testing. Although it does move what is typically in the Assert phase of a test to the Arrange (aka Setup) phase, the expected information does still stay within the unit test itself. Information available in the unit test is good for documentation and knowledge capture.

If the model that is designed will have similar characteristics I am all for it.

Is this only conceptual at this phase or is there a design that can be seen?

@jphickey
Copy link
Contributor Author

Checking that code (at the lowest level possible) is properly written is the goal of unit testing

I agree on this, except I call it "functional" test, and I define "properly written" as producing the correct end result, not that it matches some specific reference implementation.

To go back to a more concrete example, if the objective is to write a file (lets say 100 bytes), it shouldn't matter if the function used write or pwrite (posix) or fwrite() (ansi), or OS_write() or OS_TimedWrite() (osal) to actually implement that write. It also shouldn't matter if the function wrote the 100 bytes by calling the write function w/one byte at a time (buffer of size 1, with 100 calls) or a single call of size 100, or something in between. It could write more than 100 bytes, then truncate it back to the actual size. At the end of the day, a functional test should confirm that the correct content was produced, not the specific method used to produce it.

This is why I don't necessarily agree with this feature; one can easily get into the "weeds" of checking for a very specific implementation, that is, that a function called OS_write() with buffer A and size X, followed by OS_write with buffer B and size Y, and so forth. That isn't checking for correctness of behavior, its only checking if someone changed the implementation somehow.

The objective of a coverage test should be to confirm that the implementation is coherent/valid for all possible branch paths through the function, while leaving it to a functional test to check that the implementation actually produced the intended result/effect.

For example, it is possible that an error handler in the beginning of a function might leave a pointer or object uninitialized, which might be dereferenced/used later in the function. By confirming all possible branch paths (and ideally, all combinations thereof) are executed, this type of error will hopefully be caught/noticed. This is the primary objective of a coverage test, as it is often not possible to exercise every combination of error handling otherwise.

There are many valid cases where coverage test can check for correctness too, particularly for localized algorithms that don't depend on external calls. It can certainly be valuable to do this where it is relevant/possible. This can help ensure that your functional test will pass, but it isn't a substitute for it. One still needs to confirm that it works in the "real world" outside of mocks and stubs.

Is this only conceptual at this phase or is there a design that can be seen?

It is conceptual, I have not implemented anything yet, I will await consensus on the subject. The idea is that there is an object, similar to the "Context" object that already is defined on hooks/handlers, that can be instantiated by a test case prior to executing the unit under test, and associated with a stub function. When that stub is called during test execution, it will assert that the actual arguments match the expected values, and report any differences between them.

@asgibson
Copy link
Contributor

I agree on this, except I call it "functional" test

When I hear the term "functional" for a test it has meant "an integration test (more than a single unit) of a high level (usually multiple units, more likely the whole software system)." This may be a source of misunderstanding. My description of the levels of software testing are this: unit > integration > end to end (e2e). I now use e2e to describe what I previously called "functional" level tests. All tests are unit tests, but only unit tests are unit tests. While I realize that seems nonsensical, it really means: call a test for what it is at the highest level. So, as many have previously pointed out, any test can be considered a unit test (based upon definition of unit!). However, only those that access the code at the "most independent" level possible (often the method level) should be given the moniker "unit test."

If the functional tests plus the coverage tests are both being done, I am not surprised that you fret over the time it takes to complete the unit testing. The way I write unit tests appears to me to provide both the behavioral testing of the code at the most independent level and coverage. Essentially it is a 2 for 1 deal. No wonder you think coverage testing adds so much time to the development! But wait, there is more, you get a living breathing documentation of the code that offers ready made tests for when oddities are encountered during higher level integration testing. Integration tests now become what they are designed to be, testing the interaction between units and absolutely not attempting to debug coding errors. In my experience, with my type of unit tests in place, the times where method design (technically unit design, aka coding errors) problems show up during integration level test runs are when the unit test was not written. To me, I cannot understand why unit testing is not done this way already.

From https://www.toptal.com/qa/how-to-write-testable-code-and-why-it-matters:
"the real problems that complicate unit testing, and introduce expensive complexity, are a result of poorly-designed, untestable code."

The above quote is exactly why I advocate for Test Driven Design. Write the tests for the behavior you want the code to have. Watch it fail. Write some production code. Run the test again. When it fails, fix the production code (wash, rinse, repeat as necessary). When it passes your production code has the test's described behavior. Run all the unit tests, show your new behavior did not change old behaviors. Magic! Well, no, actually, Science!!

As far as concrete examples go, I have been told and on good authority, that those are usually manipulations designed only to strengthen one's own position. Often they are pigeon holed to that end. I will try to reasonably respond. My unit tests would care about the actions within the method dependent upon the outside usage of a function that is called once vs. multiple times. I would want to know that my code under test (CUT) responds correctly to an error returned from that function. I would be very interested to know that the CUT responds correctly if that error appears in the first or last call as both are edge cases. In the "only one call" method these are the same and would only require a single test; however, in the "100" calls case they are absolutely not.

@skliper
Copy link
Contributor

skliper commented Apr 30, 2021

I appreciate the variety of perspectives which lead to many different approaches in addressing testing requirements. I have found it beneficial to balance the level of effort invested in testing with stubs vs full stack, whatever name you want to apply to these configurations. Verifying functionality utilizing the full stack allows for testing if the actual requirement was satisfied, and testing with stubs allows for exercising and verifying error paths and other corner cases are implemented appropriately. While I agree completely isolating a unit (testing with stubs) is absolutely useful and part of how we satisfy our unit test requirements, I encourage considering the efficiencies and usefulness that full stack testing can provide even when debugging (and including this sort of testing in CI). The OSAL is an excellent example, where the stub tests are implemented in src/unit-test-coverage, and full stack tests are implemented in src/tests. Note the full stack tests utilize the full stack on the target (when cross compiled) to verify functionality at the lowest level (API calls) in the intended environment. We've also begun implementing similar full stack API level tests in cFE similar concept as the old "test apps" but utilizing ut_assert. The benefit of this approach is verifying functionality using the full stack does not require intimate knowledge and codification of the various underlying OS's and platform behaviors as a test with stubs would require to attempt to show the same thing.

Really I'm just trying to encourage utilization of the entire toolset being provided to satisfy testing requirements. Obviously there's also testing via external cmd/tlm systems w/ scripting and verification (whatever you want to call that flavor, but typically what I think of as integration and/or build verification testing), but that's not what I'm referring to above when I think of "unit testing".

@skliper
Copy link
Contributor

skliper commented Apr 30, 2021

As a developer, I've frequently identified bugs when using the full stack that testing with stubs missed. This experience base leads me to distrust claims that testing with stubs meets the functional part of unit testing requirements. You could argue that the tests within the stub environment weren't complex enough, but considering the level of effort required it has been my experience that testing with the full stack is a more efficient and effective approach to proving functionality at a low level.

@asgibson
Copy link
Contributor

I am going on what I was taught and by what I have experienced first hand. If I didn't think it works I wouldn't be touting it, I can only assume that anyone would do the same.

A point I will return to is @jphickey's previous mentions (in other threads) that this whole process of functional and coverage tests takes too long (with, I believe, an emphasis on the coverage part) which is an earmark of the exact problems I have seen solved by the approach I advocate. I do not find that a "go back and test" approach works nearly as well, nor is the different developer writing the unit test (which, yes, is what I currently do with cf).

"the real problems that complicate unit testing, and introduce expensive complexity, are a result of poorly-designed, untestable code."

Test Driven Design (TDD) is a strategy that solves some of the main issues that plague other unit testing methods. Code developers write the behavioral unit tests that describe what the code they are going to write does. It is not what a developer thinks it does, it literally becomes what it does when the test passes. This provides first hand knowledge capture. It also gains value over time, unlike comments on code that do not really explain what it the behavior of the code is supposed to be, so that later it must be replaced because no one knows what it did in the first place, except for the developer who is long since gone! (phew!) It is living breathing documentation of the code by the person who wrote it.

TDD core process is easy to follow; write the test for the ascribed behavior; watch it fail; write the production code to make it pass; watch it pass. You have proven a behavior and then, along with the other unit tests that still pass, also proven no other behavior was affected by the production code that was written. When they don't still pass, bam!, instant feedback that something is amiss. This is amazing stuff. It dissolves bugs from even getting committed!

Coverage then becomes a valuable tool to show where you have been lax in testing, but not truly the end goal. The end goal is better designed code that is fully tested and has a built in rationale associated with it! When development is done TDD, high coverage becomes the norm because you are always developing tests while the code is developed and the worry about how to get that coverage never happens. Coverage is a requirement only because it shows when something has not been tested (which is bad!), it will never prove anything other than the code has been touched while in a state that was valid for a particular path. We should not be testing for coverage just to show it can be done or to "check the box." Coverage should be a byproduct of the testing strategy.

Targeted direct testing of independent units of code that verify behavior written in a TDD manner offers:

  • knowledge capture from the person who wrote the code
  • built in debugging for less error prone code
  • runnable documentation of the intended behaviors
  • regression testing
  • code coverage
  • high confidence of success

@skliper
Copy link
Contributor

skliper commented May 3, 2021

I'm actually a fan the option to turn on code coverage statistics when testing with the full stack, which then shows what logic is not covered using that flavor of testing. Really it should be everything except the error cases that are impossible or significantly difficult to reach without utilizing stubs. Then testing with stubs only really needs to check the remaining error conditions, and are the correct paths/responses exercised. I think we are all agreeing unit testing is good, and part of design. I think our only divergence is I don't agree testing with stubs can be used to verify functionality. I understand you can verify the behavior of the code under test, but that only confirms that it behaves in a way that matches whatever the test is checking (which doesn't mean the code will function correctly in a full stack environment).

Either way, I'll continue to advocate for both flavors of testing during development since I'm not aware of any other way to meet the unit test requirements.

@asgibson
Copy link
Contributor

I still see it as doing twice the amount of work necessary, but I will also agree that testing, by whatever means, is always better than not testing at all.

I do think that there is far to much emphasis placed upon the term "coverage", but please do not misunderstand, I do get why it is mandatory. My point is simply this: full coverage is a necessary, but insufficient condition to ensure adequate testing (thanks to my mentee for the wording on this, most excellent!). When you place emphasis on coverage, the real reason it exists is lost -- it becomes only a check box. Coverage is only a requirement because lack of it means you have not adequately tested, but having it does not mean that you have adequately tested the code. I think this is why there are coverage and functional tests -- functional to do the verification and coverage to fill in the gaps to show the code does not fail, segfault, or some other third thing? The type of testing I advocate gets the coverage because it tests everything by design.

Unit tests (small, direct, lots of them), Integration tests (far fewer, simple as two units up to far more complex), End-to-end tests (full system, requirements verification in "full stack environment"). Start small, build large, verify all along the way. More emphasis on design than coverage for both tests and production code, more emphasis on testing correctly than box checking (that happens organically), more confidence in reusable code.

I am advocating for this particular enhancement because it will facilitate this. I do not know if it will help in the coverage tests and/or functional tests, because I do not write them.

jphickey pushed a commit to jphickey/osal that referenced this issue Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants