Add library(testing) #2827

bakaq · 2025-02-15T18:36:27Z

This is a polished version of the testing framework that I made for library(dif) and was eventually adapted to be used for internal library tests. This makes it convenient enough to be used as a general purpose testing framework. Features include:

Filtering tests by name.
Colors to aid comprehension by users (can be turned off for use in pipes).
Exits with status code reflecting if all tests that were run passed or not, for use in scripting.
Run tests in multiple modules, so that you can write tests inline in your module's file. This gets quite tricky in my experience with meta-predicates and the limitations of the module system, so I don't recommend it in the documentation.
Errors and exceptions cause the test to fail, but they are portrayed to the user to aid debugging.

Here's an image of the tests in action:

This PR also ports all the internal tests that were using the old framework to this one. It deletes the .stdout files so that stdout isn't checked. If any test fails the status code will change,trycmd will treat that as a failure, and then it will show the rich test description making it easier to find the problem. An example of what would be shown by cargo test if a test of library(dif) regressed:

triska · 2025-02-15T19:09:39Z

Please consider how @dcnorris approaches this in his branch, he managed to use embedded answer descriptions (see #2746 for preliminary support) as tests:

dcnorris@9c0bf2f

hurufu · 2025-02-15T21:31:40Z

Nice! Finally it is a separate library and not hidden in a test code :) I already have a wish list of additional features that are useful:

Ability to mark certain tests as permanently skipped. Also print something like "skipped" in console.
For some test cases an expected outcome might be false or exception. It would be nice if framework could've handle this without manually writing wrappers for every such case – explicit test outcome maybe?
A way to see how many solutions a test case have produced – sometimes you have to test that a given predicate gives only single solution.
Detect possible non-termination (can be achieved using call_with_inference_limit/3) – test suits are running by automated systems there should be a way to terminate long-running tests.

bakaq · 2025-02-15T21:42:35Z

Please consider how @dcnorris approaches this in his branch, he managed to use embedded answer descriptions

I think quad based tests are complimentary to this more general test framework. Things that are easy in one system are hard in the other. For example, I think filtering (which is an extremely useful feature to have) would be much harder in quads, but testing individual queries (and communicating them with others) is much more convenient. I intend to build a property based testing framework on top of this one, and I think it's much more fitting than quads for that.

I already have a wish list of additional features that are useful: [...]

Apart from the skipping, all of these can already be done manually with wrappers. I plan on adding support for test/3 tests which have a list of options as a second argument to give some of these features eventually. Something like this:

test("name of test", [
    expect(error),
    skip,
    num_solutions(3),
    other_option
  ], (
    test_body
  )
).

For now I just implemented the 20% that gives 80% of the value.

triska · 2025-02-15T21:49:31Z

Please reconsider this direction: A testing framework that requires specifying test cases in such a different way compared to what the toplevel emits will pose very serious challenges and mental overhead every time it is used.

dcnorris · 2025-02-16T11:38:11Z

I intend to build a property based testing framework on top of this one, and I think it's much more fitting than quads for that.

Assuming you're referring to https://en.wikipedia.org/wiki/Software_testing#Property_testing, and that the essence of the approach is its generative aspect, then I believe I've accomplished something of this sort quite naturally in quads-based tests for some numerics functions: https://github.com/dcnorris/scryer-prolog/blob/ea714f13fb5ebba99004fd11abd481210ee7b864/src/lib/numerics/special_functions.pl#L49C1-L59C10

To be sure, I've had to introduce domain-specific predicates — such as odd_t/3 and δ_inverses_t/5 in https://github.com/dcnorris/scryer-prolog/blob/special/src/lib/numerics/testutils.pl. But enriching the domain of discourse in this way seems to me very much in the state-what-holds spirit of pure Prolog.

jjtolton · 2025-02-18T02:31:43Z

Please reconsider this direction: A testing framework that requires specifying test cases in such a different way compared to what the toplevel emits will pose very serious challenges and mental overhead every time it is used.

I'm confused -- would like to invite a discussion on the semantics of testing. Also, quads.

bakaq added 5 commits February 14, 2025 22:31

Add testing library

d9e672b

Port tests to testing library

17fbcc5

Remove old internal test_framework

feb8011

Support running tests in modules

394cd2a

library(testing) documentation

fc1433b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add library(testing) #2827

Add library(testing) #2827

bakaq commented Feb 15, 2025 •

edited

Loading

triska commented Feb 15, 2025

hurufu commented Feb 15, 2025

bakaq commented Feb 15, 2025

triska commented Feb 15, 2025

dcnorris commented Feb 16, 2025

jjtolton commented Feb 18, 2025 •

edited

Loading

Add library(testing) #2827

Are you sure you want to change the base?

Add library(testing) #2827

Conversation

bakaq commented Feb 15, 2025 • edited Loading

triska commented Feb 15, 2025

hurufu commented Feb 15, 2025

bakaq commented Feb 15, 2025

triska commented Feb 15, 2025

dcnorris commented Feb 16, 2025

jjtolton commented Feb 18, 2025 • edited Loading

bakaq commented Feb 15, 2025 •

edited

Loading

jjtolton commented Feb 18, 2025 •

edited

Loading