Use Common Test for testing when Gradualizer should pass, fail, and its known problems #567

erszcz · 2024-06-02T21:11:48Z

This PR adds ability to test Gradualizer functionality with Common Test (aka CT), which reports results in a way that enables much more convenient comparisons between builds. CT generates HTML test run reports. The Makefile rule, make ct, labels test runs with git commit info and a modified annotation, if a run tested a modified working directory. All in all, when introducing a change, this testing framework makes it easier to spot what broke and compare results between various builds.

For now, it's not integrated with CI.

Usage:

make ct

or if we're only interested in a specific suite:

rebar3 ct --suite=known_problems_should_pass_SUITE --label="git: $(git describe --tags --always) $(git diff --no-ext-diff --quiet --exit-code || echo '(modified)')"

Example results

Command line:

Please note that line numbers are always the same, since the CT tests (test functions) are generated from a template. However, the suite and test name are sufficient to identify the typechecked file the CT test corresponds to.

HTML listing of all runs (just 2 in this case):

Two runs compared side by side:

Sadly, this doesn't work, as init_per_suite/1 is called after all/0, so a suite with all/0 not returning any tests is not run :(

…nfo_pass.erl This is necessary to avoid a problem with the test module name being used as a generated test name, which is a function name. It lead to a name clash with preexisting module_info/1, which is defined for every module.

xxdavid

I am not really familiar with Common Test and I also don't fully understand all the magic in gradualizer_dynamic_suite. But under this disclaimer, this PR looks quite good to me (apart from a few comments I left). The output is nicer compared to EUnit and, as you said, it's probably easier to compare between different versions of Gradualizer. It's a bit slower than the EUnit tests (15 secs vs 10 secs on my machine) but that's not a problem I think. Overall, I consider it as an improvement over the EUnit tests. Thanks for implementing it!

Do you plan to replace the EUnit tests for should_* with these CT tests, or do you want them to coexist (which would cause duplicated code)?

Thanks for extending the PR description and adding screenshots, it made it much easier to grasp the intended usage. If this gets merged, maybe we should also document it somewhere.

test/should_fail_SUITE.erl

xxdavid · 2024-06-04T19:12:55Z

test/should_pass_SUITE.erl

+     variable_binding_leaks].
+
+should_pass_template(_@File) ->
+    ?assertEqual(ok, gradualizer:type_check_file(_@File, [{form_check_timeout_ms, 2000}])).


The EUnit tests print a Gradualizer's error message describing the type error when a test fails. I find it useful for quickly glancing over what's going on without having to inspect all the failed test files one by one. Do you think it's possible with CT?

It's available in the CT HTML report, but that requires a little bit of clicking in the browser. It might be possible to get in the shell, too, but I'd have to think some more about it.

IIRC, CT supports plain text reports too. I don't know it well, but if it's too complicated, maybe it isn's worth it.

If we run it ourselves instead of relying on rebar3, we'll have better control over it.

zuiderkwast · 2024-06-11T15:01:44Z

Makefile

@@ -149,6 +149,9 @@ eunit: compile-tests
 	erl $(ERL_OPTS) -noinput -pa ebin -pa test -eval \
 	 '$(erl_run_eunit), halt().'

+ct:
+	@rebar3 ct --label "git: $$(git describe --tags --always) $$(git diff --no-ext-diff --quiet --exit-code || echo '(modified)')"


We made this whole makefile work without rebar3 because of some annoyances we had with it and to get better control of what's happening. If we mix rebar3 with non-rebar3 we'll have files built in various place and a mess in general. I don't like that.

Isn't it fairly straitforward to run ct without rebar3? It's a command line tool.

Add ct to the tests target and to .PHONY.

We should be able to modify the logic we have in erl_cover_run function to have coverage computed on eunit and commontest combined, or only commontest if we port all eunit tests to commontest.

To run a specific suite, we can do something similar to what erlang.mk does, e.g. make ct suite=known_problems_should_pass_SUITE.

We made this whole makefile work without rebar3...

Generally, I think rebar3 is quite mature and I'd happily drop the Makefile, to be honest. though I admit I see some small benefits of using a tailored Makefile.

Isn't it fairly straitforward to run ct without rebar3? It's a command line tool.

It's possible, but the nice CLI output is provided by a custom Rebar3 CT hook, so if we run CT directly, we'll get uglier printouts.

Add ct to the tests target and to .PHONY.

If we're happy with moving completely to CT, I can do that. For now, I considered this a nicer alternative for local development (especially comparing test results across builds), but did not include it in the CI, nor did I remove the original EUnit tests this is based on. In this light, it didn't make sense to run them twice, so the CT variants are not in tests.

zuiderkwast · 2024-06-11T15:04:23Z

test/gradualizer_dynamic_suite.erl

+    Module = ?config(dynamic_suite_module, Config),
+    ?assert(Module /= undefined),
+    case erlang:function_exported(Module, generated_tests, 0) of
+        true ->
+            {ok, Module:generated_tests()};


What's this test suite doing?

It looks like a lot of magic. I don't like magic. :)

There's a little bit of magic, but let's start from the standard CT conventions. CT treats files matching *_SUITE.erl as test suites, so gradualizer_dynamic_suite.erl is not a test suite, it's a helper file for use in test suites. The actual test suites match the previous tests defined in EUnit:

test/known_problems_should_fail_SUITE.erl

test/known_problems_should_pass_SUITE.erl

test/should_fail_SUITE.erl

test/should_pass_SUITE.erl.

The dynamic suite helper exports only one function - gradualizer_dynamic_suite:reload/1 - which generates CT test cases based on passed in config and then dynamically reloads the module it's invoked from. It's idempotent, so calling it once, twice, or 100 times has the same effect. That's where the name comes from - "dynamic suite". The reason for this helper is that, unlike EUnit, CT has no standard way to generate tests dynamically.

The tests are generated for each of the test files defined under the respective should pass/fail/known problems directories.

The most enigmatic part is where to actually invoke gradualizer_dynamic_suite:reload/1 in a client test suite - this is something I had to dive in OTP CT code to figure out. Apparently, *_SUITE:groups/0 is the first function CT calls when running a test suite, so it's the best place to call reload/1 from. Before @xxdavid's remark that listing cases by hand is tedious we could call reload/1 from something more logical, like init_per_suite/1, but if we want to avoid listing the to-be-generated tests manually, we have to play with how CT runs the test suites.

zuiderkwast · 2024-06-11T15:06:14Z

test/known_problems_should_fail_SUITE.erl

@@ -0,0 +1,63 @@
+-module(known_problems_should_fail_SUITE).


Did you duplicate the eunit test suites as commontest suite?

I don't want duplicated logic. Delete the eunit test suites that have been ported to commontest.

I did, yes. If we're happy with moving completely to CT, I'll delete the EUnit tests.

zuiderkwast · 2024-06-11T15:08:57Z

test/should_pass_SUITE.erl

+     variable_binding_leaks].
+
+should_pass_template(_@File) ->
+    ?assertEqual(ok, gradualizer:type_check_file(_@File, [{form_check_timeout_ms, 2000}])).


IIRC, CT supports plain text reports too. I don't know it well, but if it's too complicated, maybe it isn's worth it.

If we run it ourselves instead of relying on rebar3, we'll have better control over it.

erszcz added 17 commits June 2, 2024 21:01

Sketch a Common Test dynamic suite shape

c66cfb4

Add Makefile ct rule which labels a test run with git commit

3993820

Get should_fail test generation to work

d97a9f4

Generate all tests for a given path

a46d2a0

Generate SUITE:all/0 dynamically

193dcb0

Sadly, this doesn't work, as init_per_suite/1 is called after all/0, so a suite with all/0 not returning any tests is not run :(

Define all/0 tests manually

81e8526

Make paths relative to app location

4213687

Use EUnit macros for CLI readability

57b9ee2

Clean up

fcb443d

Make test template function configurable, pass params from the top

f150d53

Extract gradualizer_dynamic_suite

34aa4a4

Add test/should_pass_SUITE.erl

2c4df55

Don't explicitly enable maybe_expr

b1d8222

Add test/known_problems_should_fail_SUITE.erl

e345dc1

Add test/known_problems_should_pass_SUITE.erl

314ca8b

Parallelize CT tests

b7d3c32

erszcz requested review from zuiderkwast and xxdavid June 2, 2024 21:11

erszcz changed the title ~~Use Common Test for testing when Gradualize should pass, fail, and its known problems~~ Use Common Test for testing when Gradualizer should pass, fail, and its known problems Jun 3, 2024

xxdavid reviewed Jun 4, 2024

View reviewed changes

erszcz force-pushed the use-ct branch from 0ede9ef to b31b33d Compare June 4, 2024 22:24

erszcz added 3 commits June 5, 2024 16:00

Avoid hardcoding names of tests which are to be generated

06e5066

Add a longer timeout in all generated tests

e6c91a8

Remove unnecessary stubs

dbd167f

erszcz force-pushed the use-ct branch from 1a53626 to dbd167f Compare June 5, 2024 14:00

zuiderkwast requested changes Jun 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Common Test for testing when Gradualizer should pass, fail, and its known problems #567

Use Common Test for testing when Gradualizer should pass, fail, and its known problems #567

erszcz commented Jun 2, 2024 •

edited

Loading

xxdavid left a comment

xxdavid Jun 4, 2024

erszcz Jun 4, 2024

zuiderkwast Jun 11, 2024

zuiderkwast Jun 11, 2024

erszcz Jul 2, 2024

zuiderkwast Jun 11, 2024

erszcz Jul 2, 2024

zuiderkwast Jun 11, 2024

erszcz Jul 2, 2024

zuiderkwast Jun 11, 2024

Use Common Test for testing when Gradualizer should pass, fail, and its known problems #567

Are you sure you want to change the base?

Use Common Test for testing when Gradualizer should pass, fail, and its known problems #567

Conversation

erszcz commented Jun 2, 2024 • edited Loading

Example results

xxdavid left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erszcz commented Jun 2, 2024 •

edited

Loading