Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Common Test for testing when Gradualizer should pass, fail, and its known problems #567

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

erszcz
Copy link
Collaborator

@erszcz erszcz commented Jun 2, 2024

This PR adds ability to test Gradualizer functionality with Common Test (aka CT), which reports results in a way that enables much more convenient comparisons between builds. CT generates HTML test run reports. The Makefile rule, make ct, labels test runs with git commit info and a modified annotation, if a run tested a modified working directory. All in all, when introducing a change, this testing framework makes it easier to spot what broke and compare results between various builds.

For now, it's not integrated with CI.

Usage:

make ct

or if we're only interested in a specific suite:

rebar3 ct --suite=known_problems_should_pass_SUITE --label="git: $(git describe --tags --always) $(git diff --no-ext-diff --quiet --exit-code || echo '(modified)')"

Example results

Command line:

Screenshot 2024-06-04 at 09 32 35

Please note that line numbers are always the same, since the CT tests (test functions) are generated from a template. However, the suite and test name are sufficient to identify the typechecked file the CT test corresponds to.

HTML listing of all runs (just 2 in this case):

Screenshot 2024-06-04 at 09 41 44

Two runs compared side by side:

Screenshot 2024-06-04 at 09 35 28 (2)

@erszcz erszcz changed the title Use Common Test for testing when Gradualize should pass, fail, and its known problems Use Common Test for testing when Gradualizer should pass, fail, and its known problems Jun 3, 2024
Copy link
Collaborator

@xxdavid xxdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really familiar with Common Test and I also don't fully understand all the magic in gradualizer_dynamic_suite. But under this disclaimer, this PR looks quite good to me (apart from a few comments I left). The output is nicer compared to EUnit and, as you said, it's probably easier to compare between different versions of Gradualizer. It's a bit slower than the EUnit tests (15 secs vs 10 secs on my machine) but that's not a problem I think. Overall, I consider it as an improvement over the EUnit tests. Thanks for implementing it!

Do you plan to replace the EUnit tests for should_* with these CT tests, or do you want them to coexist (which would cause duplicated code)?

Thanks for extending the PR description and adding screenshots, it made it much easier to grasp the intended usage. If this gets merged, maybe we should also document it somewhere.

test/should_fail_SUITE.erl Outdated Show resolved Hide resolved
test/should_fail_SUITE.erl Outdated Show resolved Hide resolved
variable_binding_leaks].

should_pass_template(_@File) ->
?assertEqual(ok, gradualizer:type_check_file(_@File, [{form_check_timeout_ms, 2000}])).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EUnit tests print a Gradualizer's error message describing the type error when a test fails. I find it useful for quickly glancing over what's going on without having to inspect all the failed test files one by one. Do you think it's possible with CT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's available in the CT HTML report, but that requires a little bit of clicking in the browser. It might be possible to get in the shell, too, but I'd have to think some more about it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, CT supports plain text reports too. I don't know it well, but if it's too complicated, maybe it isn's worth it.

If we run it ourselves instead of relying on rebar3, we'll have better control over it.

@@ -149,6 +149,9 @@ eunit: compile-tests
erl $(ERL_OPTS) -noinput -pa ebin -pa test -eval \
'$(erl_run_eunit), halt().'

ct:
@rebar3 ct --label "git: $$(git describe --tags --always) $$(git diff --no-ext-diff --quiet --exit-code || echo '(modified)')"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We made this whole makefile work without rebar3 because of some annoyances we had with it and to get better control of what's happening. If we mix rebar3 with non-rebar3 we'll have files built in various place and a mess in general. I don't like that.

Isn't it fairly straitforward to run ct without rebar3? It's a command line tool.

Add ct to the tests target and to .PHONY.

We should be able to modify the logic we have in erl_cover_run function to have coverage computed on eunit and commontest combined, or only commontest if we port all eunit tests to commontest.

To run a specific suite, we can do something similar to what erlang.mk does, e.g. make ct suite=known_problems_should_pass_SUITE.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We made this whole makefile work without rebar3...

Generally, I think rebar3 is quite mature and I'd happily drop the Makefile, to be honest. though I admit I see some small benefits of using a tailored Makefile.

Isn't it fairly straitforward to run ct without rebar3? It's a command line tool.

It's possible, but the nice CLI output is provided by a custom Rebar3 CT hook, so if we run CT directly, we'll get uglier printouts.

Add ct to the tests target and to .PHONY.

If we're happy with moving completely to CT, I can do that. For now, I considered this a nicer alternative for local development (especially comparing test results across builds), but did not include it in the CI, nor did I remove the original EUnit tests this is based on. In this light, it didn't make sense to run them twice, so the CT variants are not in tests.

Comment on lines +9 to +13
Module = ?config(dynamic_suite_module, Config),
?assert(Module /= undefined),
case erlang:function_exported(Module, generated_tests, 0) of
true ->
{ok, Module:generated_tests()};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this test suite doing?

It looks like a lot of magic. I don't like magic. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a little bit of magic, but let's start from the standard CT conventions. CT treats files matching *_SUITE.erl as test suites, so gradualizer_dynamic_suite.erl is not a test suite, it's a helper file for use in test suites. The actual test suites match the previous tests defined in EUnit:

  • test/known_problems_should_fail_SUITE.erl
  • test/known_problems_should_pass_SUITE.erl
  • test/should_fail_SUITE.erl
  • test/should_pass_SUITE.erl.

The dynamic suite helper exports only one function - gradualizer_dynamic_suite:reload/1 - which generates CT test cases based on passed in config and then dynamically reloads the module it's invoked from. It's idempotent, so calling it once, twice, or 100 times has the same effect. That's where the name comes from - "dynamic suite". The reason for this helper is that, unlike EUnit, CT has no standard way to generate tests dynamically.

The tests are generated for each of the test files defined under the respective should pass/fail/known problems directories.

The most enigmatic part is where to actually invoke gradualizer_dynamic_suite:reload/1 in a client test suite - this is something I had to dive in OTP CT code to figure out. Apparently, *_SUITE:groups/0 is the first function CT calls when running a test suite, so it's the best place to call reload/1 from. Before @xxdavid's remark that listing cases by hand is tedious we could call reload/1 from something more logical, like init_per_suite/1, but if we want to avoid listing the to-be-generated tests manually, we have to play with how CT runs the test suites.

@@ -0,0 +1,63 @@
-module(known_problems_should_fail_SUITE).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you duplicate the eunit test suites as commontest suite?

I don't want duplicated logic. Delete the eunit test suites that have been ported to commontest.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, yes. If we're happy with moving completely to CT, I'll delete the EUnit tests.

variable_binding_leaks].

should_pass_template(_@File) ->
?assertEqual(ok, gradualizer:type_check_file(_@File, [{form_check_timeout_ms, 2000}])).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, CT supports plain text reports too. I don't know it well, but if it's too complicated, maybe it isn's worth it.

If we run it ourselves instead of relying on rebar3, we'll have better control over it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants