Skip to content
Bobby Martin edited this page Apr 3, 2020 · 14 revisions

Overview

The rabin tool is a command line program (rabin) that will let you play with rabin fingerprints, breaking files up into rabin chunks, compressing files that have redundancy in their content.

Let’s say you want to:

  • build binary diff utilities
  • store multiple versions of a set of files with minimal space used
  • get a little better compression than gzip alone
  • check files to see if they have a virus signature embedded in them

The rabin tool can let you do all of those things the unix way, by writing a script that coordinates rabin commands along with other commands like grep, cat, etc.

To build & play with rabin-tool

Download the source

Unarchive the source, and open a shell in the root directory. Then:

make
cd src
./mkrabin.sh

That should build the rabin executable.

What the tool is good for

The rabin executable can be used in several modes: it can compress files that have redundancy (rabin + gzip typically compresses about 15% smaller than gzip alone). It can decompress the rabin compressed files. Here’s an example of using rabin to compress/extract:

./rabin -c "some file" > "some file.rabin"
ls -l "some file"*
./rabin -x -o "some file.after rabin" "some file.rabin"
diff "some file" "some file.after rabin"

The tool can also break down a number of files into chunks and store all the chunks in one central repository, producing a manifest of the chunks. It can reassemble those chunks to reproduce any of the original files. After breaking files down into chunks this way, you can (for example) sort the chunk list of two files and compare them to see approximately how much overlap there is in content between the two files.

To see all the Rabin options:
./rabin -?

Credits

The rabin tool is based on the Sliding Window Based Rabin Fingerprint Computation Library. Here are the docs for that library.
(Note that the following are NOT the docs for this command line tool!)

o Function
Given a stream of input, compute the rabin-fingerprint of a N-byte-long
sliding window, reading the bytes of the stream one by one

o Usage
– rabinpoly library provides a very simple interface that
1) defines the sliding window size,
2) reads bytes one by one and returns the rabin-fingerprint of
the current sliding window, and
3) resets the sliding window.

- Look at the example in the example/ directory. The example code reader.cc takes a file and reports the rabin-fingerprints of overlapping 64Byte-long sliding windows reading the bytes one by one. To compile the example program, read README under the directory. - You can find the detail of Rabin Fingerprint in “Rabin, M. O. Fingerprinting by Random Polynomials. Tech. Rep. TR-15-81. Center for Research in Computing Technology. Harvard University, 1981."

o History
– Taken from David Mazieres’s LBFS implementation and modified as a
stand-alone version by Niraj Tolia et al. (For Content Addressable Storage)
– Modified to support a configurable sliding window size and packaged by
Hyang-Ah Kim
– 2005.12.2 version 1.0a
Added an example file which was missing in the initial distribution

Clone this wiki locally