Detect and report rotten green tests #176

cschreib · 2024-06-09T08:34:04Z

I read an article the other day on "rotten green test", i.e., tests that are reported "green" (passed), but that don't actually execute anything, hence don't test what they should. There's a report on the topic, and a PR for GoogleTest to attempt to catch and report this. Perhaps we should also do something about it too.

horenmar · 2024-12-08T10:05:14Z

You can probably yoink this from Catch2 semi-directly

For context, the reason why it is not the default is use of sections-as-generators like this

TEST_CASE() {
    int i;
    SECTION("A") { i = 2; }
    SECTION("B") { i = 3; }
    REQUIRE(foo(i));
}

horenmar · 2024-12-08T10:11:44Z

Another important piece of this puzzle is to error out on empty test runs. This catches cases where a link issue happens that strips out the tests, or automation and your terminal have an escaping fight and you end up with invalid test specs.

cschreib · 2024-12-12T19:21:59Z

Thanks for the tip! I like the idea of the --warn <thing>, seems very useful and probably easy to implement. I think having it opt-in is probably a good idea, indeed. Have you thought of introducing two levels, i.e., warn on empty test case, and warn on empty leaf section?

Since I created this ticket, I ended up implementing a more intrusive approach at work, which also addresses the "empty test" / "rotten green test" problem. We use GoogleTest there, but the technique would work for any test framework. I wonder if we could offer it or something similar in snitch as well.

The idea is to declare at the start of the test the list of "concepts" / "features" / "requirements" you want to test, and explicitly mark each of these as "covered" at the end of the relevant checks (REQUIRE/CHECK/etc.). It looks sort of like this:

TEST_CASE() {
    const auto checker = requirement::expected("can meow properly", "can bark gently");

    for (const auto& animal : get_all_animals()) {
        if (animal.type() == "cat") {
            // Do some checks specific to cats.
            CHECK(animal.sound() == "meow");
            requirement::covered("can meow properly");
        } else if (animal.type() == "dog") {
            // Do some checks specific to dogs.
            CHECK(animal.sound() == "woof");
            requirement::covered("can bark gently");
        }
    }
}

The checker variable will automatically verify in its destructor that all "expected" features have been "covered", and report a failure if not. In the example above, the test might actually end up executing no checks if the get_all_animals() function returns an empty list, or maybe returns a list of animals none of which are cats or dogs. Or, more subtle, get_all_animals() could return a list with only cats and no dogs, and you silently end up only not testing your dogs. The empty test warnings wouldn't catch that, since we'd have seen some CHECK (though for the cats only).

There are ways to restructure the tests to avoid these pitfalls, or at least in such a way that the empty test warnings would catch the problem (using sections for example). But it might not always be practicable, e.g. if get_all_animals() is costly.

This approach also catches more accidents, like an early return bypassing some but not all the checks (including exceptions, if they are manually handled inside the test). But also, a particular worry of mine, to strengthen cases where the actual checks are outsourced to a "helper" function (usually to reduce code duplication); this tends to visually separate the checks from the test case, making it easy to refactor the checks and break the intent of the test case:

TEST_CASE() {
    for (const auto& animal : get_all_animals()) {
        if (animal.type() == "cat") {
            check_cat(animal);
        } else if (animal.type() == "dog") {
            check_dog(animal);
        }
    }
}

void check_cat(const auto& animal) {
    // Do some checks specific to cats...

    // Oops, somebody removed this because it was failing another test.
    // CHECK(animal.sound() == "meow");
}

With the requirement::covered() statement, there is a visual warning sign that something depends on that check:

void check_cat(const auto& animal) {
    // Do some checks specific to cats...

    CHECK(animal.sound() == "meow");           // these two lines...
    requirement::covered("can meow properly"); // ... would likely be removed together
}

And if someone is bold enough to remove the check anyway, removing the last two lines, the checker will ultimately complain and fail the test.

cschreib added the enhancement New feature or request label Jun 9, 2024

cschreib added this to the v1.x milestone Jun 9, 2024

cschreib mentioned this issue Jun 9, 2024

Assertion count seems to be wrong #173

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect and report rotten green tests #176

Detect and report rotten green tests #176

cschreib commented Jun 9, 2024

horenmar commented Dec 8, 2024

horenmar commented Dec 8, 2024

cschreib commented Dec 12, 2024

Detect and report rotten green tests #176

Detect and report rotten green tests #176

Comments

cschreib commented Jun 9, 2024

horenmar commented Dec 8, 2024

horenmar commented Dec 8, 2024

cschreib commented Dec 12, 2024