Investigate Running all Example code for validity #824

ntindle · 2021-08-17T01:59:58Z

Feature Request

Add support for running tests and verification for all code in the AA

Description

Using Globbing and a bit of comparison magic, we could associate all code file endings with a specific interpreter/compiler/transpiler/etc, and then run the code.

Once run, we could compare all outputs of each variation of the same implementation for validity and stability.

This would allow adding new examples of an existing language easier as it would require far less language domain knowledge for each individual language.

This would have the side effect of complicating esoteric language implementation as we would want to add it to our comparison language support set.

Additional context

This will probably be pretty complicated and need custom tooling

For Algorithm Archive Developers

This feature can be added to the Master Overview (if it cannot be, explain why in a comment below -- lack of technical expertise, not relevant to the scope of this project, too ambitious)
There is a timeline for when this feature can be implemented
The feature has been added to the Master Overview
The feature has been implemented (Please link the PR)

leios · 2021-08-17T05:08:42Z

A really important point here is that we need some consistent way to run all of the tests in every language (#691). In particular, certain language (like Julia / Rust) should come with all the necessary files to load dependencies.

ntindle · 2021-08-19T02:55:54Z

Thinking of architecture for this....

I'm imagining a scenario where we iterate over every algorithm in contents/ and run a script that will extract every language out of the code/ folder.

Each language will then be passed to a factory or builder that will create an object that we can execute and get results from. (Not sure if this is the perfect pattern)

Adding support for the language would require implementing the interface in the factory and returning that object or adding support to the builder.

All of the other solutions I've thought of have been much less elegant than this.

Still need to think on if this is the right design pattern and would appreciate insight from some real architects.

stormofice · 2021-08-23T19:21:15Z

I think the main part would be creating a standardized testing format, as most of the examples currently don't have any tests associated to them (and if they do the formats vary a lot).

If the output would follow a specific format, it would only come down to running the example with its appropriate interpreter/compiler/... and getting the results from the standard output or a file and validating it.

For non-compiled languages this should be quite easy, as long as all the necessary libraries are provided.

leios · 2021-08-23T21:53:34Z

This is a good point. We need a list of all the implementations without tests (I think almost all of them have at least a simple integration test, but we might be missing a few), and a list of all implementations with tests that are hard to check between languages.

For the output... Most of the time, we are outputting some .dat file; however, sometimes these files are specifically meant to be plotted to make an image that looks a certain way. This is hard to test properly. I think another big problems here is that I had the "bright" idea to use rand() a lot to test some algorithms and even if we set seeds, they are not universal across languages.

So maybe we should make sure that each test creates a data file that are "close enough" to each other? I mean, we can check "close enough" with an is_approx() function in most languages, so this seems doable?

stormofice · 2021-08-23T22:33:42Z

I just did a quick check and noticed that 16/22 Julia files do write to standard output (mostly older ones), but changing them to write to a .dat file instead should not be a problem.

Concerning chapters where the output does not only consist of many points, the format should also be specified some way.
For example in the Gaussian Elimination chapter it would be necessary that every implementation prints its output matrices the same (e.g. [11, 12, 13], [21], [22], [23], ...) without additional text like Starting Gaussian elimination (or similar).

This could be circumvented however by defining that lines starting with # should not be used as data for comparison (or any other char).

I agree that outputting to .dat files and comparing them approximately would work best here.

ntindle · 2021-08-24T01:57:45Z

I would also like to state that outputting to stdout is probably a valid option using redirection just to write it to a file.

I think we should investigate the implications of not having any output of the programs in the AAA on future plans and ideas (such as showing the output on the site)

stormofice · 2021-08-25T20:04:48Z

Now that I think about it, I would prefer outputting to stdout.

In most languages it is easier to output directly to the console instead of a file (which leads to more concise code) and complementary informational outputs such as Starting ... make more sense.

This would also make it easier to enable showing the output on the AAA directly in the future.

If outputting to stdout was the standard, it would just come down to creating the expected outputs as comparing and running the examples. This should not be that hard for the more popular languages with 10 or more submissions.

For reference, this is an overview on how many submissions there are per language [Click]

I believe that targeting the heavily used ones would probably be fine as a first step.

ntindle · 2021-08-25T20:06:26Z

sidebar: how did you generate the graph?

stormofice · 2021-08-25T20:09:34Z

I executed the following in the contents/ directory:
find . -type f | sed -n 's/..*\.//p' | sort | uniq -c | sort -r
This creates a list of <file extension> <number of occurrences> and then I filtered the non programming ones (e.g. png, svg, ...) and renamed some of them to look more friendly

stormofice · 2021-08-28T23:29:32Z

After some discussion on discord, this is what has come out of it:

Standardizing code output is important for automatic testing
Labeled outputs are still important for human readability (as the AAA is for humans to read)
The current suggestion for standardized output would look like the following:

[#] Calculating stuff
31

Lines starting with [#] would be ignored for comparing the result

Parsing different kinds of formatted output will still be necessary, as printing arrays/matrices/etc in language idiomatic ways will produce differently formatted outputs [For example: [x1, x2, x3] vs x1 x2 x3

stormofice · 2021-10-09T18:40:53Z

As per #864, we decided to use [#]\n instead of \n for more consisting formatting, which is easy to understand for humans and machines.

stormofice mentioned this issue Aug 27, 2021

General implementation cleanup #849

Merged

This was referenced Sep 3, 2021

Verlet Integration: Output standardization #855

Merged

Standardizing implementation input/output #856

Open

berquist added Discussion This is open for a discussion. General labels Sep 6, 2021

Amaras mentioned this issue Sep 11, 2021

Compilation instructions for languages that require it #691

Open

4 tasks

stormofice mentioned this issue Sep 12, 2021

Output standardization and formatting #864

Closed

Amaras mentioned this issue Dec 8, 2021

Java #972

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate Running all Example code for validity #824

Investigate Running all Example code for validity #824

ntindle commented Aug 17, 2021

leios commented Aug 17, 2021 •

edited

Loading

ntindle commented Aug 19, 2021

stormofice commented Aug 23, 2021

leios commented Aug 23, 2021

stormofice commented Aug 23, 2021

ntindle commented Aug 24, 2021

stormofice commented Aug 25, 2021

ntindle commented Aug 25, 2021

stormofice commented Aug 25, 2021 •

edited

Loading

stormofice commented Aug 28, 2021

stormofice commented Oct 9, 2021

Investigate Running all Example code for validity #824

Investigate Running all Example code for validity #824

Comments

ntindle commented Aug 17, 2021

Feature Request

Description

Additional context

For Algorithm Archive Developers

leios commented Aug 17, 2021 • edited Loading

ntindle commented Aug 19, 2021

stormofice commented Aug 23, 2021

leios commented Aug 23, 2021

stormofice commented Aug 23, 2021

ntindle commented Aug 24, 2021

stormofice commented Aug 25, 2021

ntindle commented Aug 25, 2021

stormofice commented Aug 25, 2021 • edited Loading

stormofice commented Aug 28, 2021

stormofice commented Oct 9, 2021

leios commented Aug 17, 2021 •

edited

Loading

stormofice commented Aug 25, 2021 •

edited

Loading