Working with test images

Test media

The metadata-extractor project maintains a set of media files (images, video, audio) for experimentation and testing. Most of these images were kindly donated by users of the library, and we welcome contributions that extend the set in interesting ways. Please ensure you have permission to share any images you donate.

Regression tests produce output from both the Java and .NET implementations, allowing each to track changes over time. It also encourages parity between the two implementations, allowing code to be shared between each with some minor modifications.

This test data lives in its own git repository (source, wiki). It's quite a large repository, requiring some time and space to clone. Thankfully you should only have to do that once.

Running regression tests

When working on the metadata-extractor library it's important to test for regressions using the test data.

To do this, you should check out all three repositories side-by-side:

$ git clone [email protected]:drewnoakes/metadata-extractor.git
$ git clone [email protected]:drewnoakes/metadata-extractor-dotnet.git
$ git clone [email protected]:drewnoakes/metadata-extractor-images.git

The Java library must be built first.

Then, open the .NET solution in the metadata-extractor-images/src/dotnet folder. This contains a single project that will run both the Java and .NET libraries on every file from the metadata-extractor-images library, writing the resulting output to text files, then producing diff files between the Java and .NET versions. These text files are committed to the repository, meaning you can use standard git tooling to highlight any changes to the output.

You can also browse the text files to look for places where the libraries differ, or for errors, and use that information to improve the library.

Unit testing

Unlike most projects, we tend to avoid adding many unit tests. Instead, we use the metadata-extractor-images repo to track regressions. Often, adding several variations of images that exercise a piece of code does a better job of ensuring correctness than a handwritten unit test.

Java / .NET runtime versions

Output will vary slightly, depending upon the framework version. For example, exception strings change, and even the presentation of floating point numbers can change.

The text files in the repo are currently created using net48 and Java 18. I'd like coverage of net8.0 and other Java versions too but haven't worked out what to do with the diff files in such cases.

If you are testing a scenario specific to a different framework, I'd suggest regenerating the text files using whatever framework you wish to target without your code changes. Then stage or commit any changes in order to get a clean baseline. Then, you can introduce your changes and re-run the generator to see the diff specific to your actual changes.

Extracting segments from test images

However, there are cases that require bytes from the image file, usually from a known container (e.g. JPEG segment). Instead of adding the whole image, you can add a (small) segment to the source repository to assist with testing. There are several unit tests that demonstrate this process.

The library has a command line tool for extracting segments from JPEG files.

But again, before extracting segments, ask yourself whether adding the image to the metadata-extractor-images repository, manually checking the initial metadata file, and later seeing regressions (or improvements) happen automatically via diffs is very useful. Testing the whole image does a much better job of assuring the software works correctly in most cases.

Provide feedback

Saved searches