Skip to content
Vladimir Panteleev edited this page Mar 4, 2012 · 64 revisions

What is DustMite?

DustMite is a tool which minimizes D source code. It was inspired by Tigris Delta and a thread on digitalmars.D.learn.

DustMite will parse the source code into a simple hierarchy, and attempt to shrink it by deleting fragments iteratively, as long as the result satisfies a user-specified condition.

Building DustMite from source is very simple, see Building DustMite for instructions.

What can it be used for?

  1. For compiler developers: reducing compiler bug test cases.
  2. Finding the source of ambiguous or misleading compiler error messages (e.g. errors with the file/line information pointing inside Phobos)
  3. Alternative unit test code coverage (DustMite can remove all code that does not affect the execution of your unit tests - see below).
  4. Similarly, if you have complete test coverage, it can be used for reducing the source tree to a minimal tree which includes support for only enabled unittests. This can be used to create a version of a program or library with a test-defined subset of features.
  5. The --obfuscate option can obfuscate your code's identifiers.
  6. It can be easily adapted to work with other languages and file formats.

How is it better than Tigris Delta?

  • Easy to use (takes only two arguments, no need to fiddle with levels, catches common user mistakes)
  • Extra features
  • Native Windows support
  • Readable output (comments and indentation are preserved)
  • Native support for multiple files (accepts a path to an entire directory for input)
  • Written for D
  • Written in D
  • Not written in Perl
  • Can recognize constructs such as try/catch, function invariants (in/out/body)

How to use it?

  1. Formulate a condition command, which should exit with a status code of 0 when DustMite is on the right track, and anything else otherwise.
    • Example: dmd test.d 2>&1 | grep -qF "Assertion failed"
    • Your command will be ran from inside the testing directory. You should use relative paths to the files that are being reduced.
    • It is recommended that your test command doesn't print anything to neither stdout nor stderr - this will break the progress indicator.
    • For non-trivial tests, you may want to place the commands in a shell script.
    • You can find some useful test script snippets here.
  2. Copy all the files that dustmite is to minimize to a new directory.
    • DustMite can minimize single files as well. name.ext will be treated like name/name.ext.
  3. If you'd like to test your condition command at this point, don't forget to clean up temporary files afterwards.
    DustMite will try to reduce all files from the specified directory, however at the moment it will not go further than simply trying to remove non-.d files.
  4. Run: dustmite path/to/directory test-command
    • You may safely terminate DustMite at any point. The current-best results will be in path/to/directory.reduced.
  5. After running out of nodes to try to remove, dustmite will exit. The reduced tree will be in path/to/directory.reduced.

Troubleshooting

Initial test fails

The command you specified returned a non-zero status for the input data set. DustMite can't know which reductions are helpful if it can't start with a known good state.

Result directory already exists

The directory .reduced exists, probably from a previous DustMite run. DustMite will exit to avoid overwriting a previous run's results. Delete the directory to start a fresh run.

Error while rmdir-ing

DustMite couldn't clean up a temporary directory. This can happen on Windows if the directory or one of its files is opened by another process, e.g. a file manager or an antivirus program. DustMite will automatically retry in a second; if the problem doesn't go away, try using Process Explorer to find the process keeping an open handle to the specified file / directory.

How does it work?

A DustMite run has the following general outline:

  1. Parse options and validate input.
  2. Load all files to memory, and parse .d files into a hierarchical data structure.
    • The data structure is a tree, where each node has a head, a list of children, and a tail.
    • head and tail are strings – slices over the original file.
    • The top-most level of the hierarchy represents files. The second level represents top-level constructs in each file, etc. Files don't have a head or tail, only the filename and children.
    • When traversing the tree in the order head - children - tail, all of the slices will cover the entire file, without any gaps or overlapping.
    • The basic idea of parsing D source code is simple: } and ; are block-terminators.
  3. Optimize the tree. Currently, this is done by rearranging all children into a binary tree.
  4. "Test" the input data without any modifications. If the test command fails (exits with a non-zero status), abort.
    • "Testing" means saving the current hierarchy to a temporary directory (path/to/directory.test), chdir-ing to it, and running the user-specified command.
  5. Iteratively attempt to remove subtrees of the data. If the test command succeeds after removing a subtree, the subtree is removed permanently and the process continues.
  6. When DustMite can't find a node to remove that doesn't cause the test command to fail, it considers the reduction complete.

Advanced usage

Windows is slow

Because process creation is comparatively much slower on Windows, a minimization will take a lot more time on Windows than on Unix-based operating systems. If possible, use an Unix-based OS to minimize large code bases.

Intermediate results

You can preview intermediate results by peeking inside path/to/directory.reduced. You should have no problems on *nix, but on Windows entering the directory may cause DustMite to pause, since it may not be able to clean up the directory when overwriting the files. DustMite will keep retrying automatically, and will resume once you leave the directory.

Command-line options

DustMite has a few command-line options. Run dustmite --help to see them.

Useful test scripts

You can find several test scripts for common tasks (e.g. timeouts, detecting specific segfaults) on this page.

Minimizing the standard library

A fully-minimized test case shouldn't depend on the standard library. To minimize Phobos along with the rest of your code, you can do something along the lines of:

  1. Copy the std directory to the input directory
  2. Rename it to mystd
  3. Search and replace std. with mystd. (in both Phobos and your code)
    • Example command: find . -name '*.d' | xargs perl -pi -e 's/\bstd\./mystd./g'

Alternatively, after copying Phobos to your project, remove the standard location from the compiler search path. You will also need to explicitly compile and link your local version of the Phobos sources together with your code - otherwise, the linker will use the version of the code from the pre-compiled static library. If you build your test case with a build tool, it should take care of this.

Selective minimization

Selective minimization may be useful if don't want to remove certain blocks from the input, which would otherwise satisfy your test condition.
For example, you may not want DustMite to remove unittest blocks if you're testing for unit test coverage.

dustmite has a --noremove option, which takes a regular expression. DustMite will not remove nodes whose head or tail is covered by any of the specified regular expressions.

--noremove also applies to file names. File names are the files' paths relative from the root of the test directory, and use forward slashes as directory separators on all platforms.

You can also surround code that is not to be removed around the magic words DustMiteNoRemoveStart and DustMiteNoRemoveStop. Note that if you place them in comments, you won't be able to use --strip-comments.

If you need DustMite not to remove parts of the code that actually get executed at runtime, you can use DMD's -cov option in combination with DustMite's --coverage option to instruct DustMite not to remove covered lines.

Alternatively, you may:

  • test the presence of the desired blocks in your test script using a combination of grep / wc -l
  • place unittests/etc. in a separate file outside of the input file set, then copy or concatenate it before testing

Ordered minimization

Combine several dustmite calls with selective minimization (e.g. different --noremove parameters) to control the order of minimization. Example:

# Reduce everything outside the gtkd package
dustmite testdir ../testscript --noremove "^gtkd/.*\.d$"
# Now reduce everything else
mv testdir.reduced testdir_pass2
dustmite testdir_pass2 ../testscript
# Final result is now in testdir_pass2.reduced

Obfuscation

DustMite can obfuscate your program if you pass the --obfuscate switch. In this mode, DustMite will collect a list of words in the program, and attempt to substitute each in turn with incremental or randomly-generated ones.

By default, DustMite will generate substitutions in lexicographical order. If you need to preserve identifier lengths (e.g. when reducing linker problems), use the --keep-length switch.

Custom parsers

If you'd like to add support for a custom language or file format, see the Entity structure and the loadFile function from the dsplit module.