Skip to content

FS-make-simple/sdelta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

	What is sdelta?

sdelta is a line blocking dictionary compressor.
Line blocking means that it makes blocks out of lines,
and therefore blocks are often aligned on line boundaries.
Dictionary compressor means that it uses the first file as a dictionary.
The second file can be expressed as a combination 
of references into the dictionary file and the 
blocks that are not in the dictionary file.


	Why sdelta?

sdelta has one primary purpose.
It is used to minimize bytes transferred to update 
previously downloaded source tarballs to current versions.
sdelta is designed to create tight deltas of text files.
Tarballs for sources are full of text files.
sdelta works extremely good on these.

Version releases of source tarballs are often 
85% or more identical to the previous release.
When upgrading installed software why download 
completely new source tarballs when the box 
already has source tarballs for previous versions?
The previously downloaded source tarballs already contain 
85% to 90% of the same data as the new source tarballs.
Why not download only the new data and 
references to the data one already has?
Download sdelta upgrade patches at 10% the size and 
10 times the speed of transferring full source tarballs.

	How does sdelta work?

sdelta attempts to find long matching sequences 
of line aligned bytes in both files.
It outputs references to the data both files have 
in common and the new data in the second file.


	How well does sdelta perform compared to xdelta?

sdelta is designed specifically for creating 
deltas of uncompressed tar files of sources.
Therefore, it is extremely good at it.
However using a dynamic block size and 
advanced pattern matching algorithim requires 
sdelta to use more time and memory than 
xdelta which uses a fixed block size.
xdelta is available from http://xdelta.sourceforge.net
sdelta is currently half as fast as xdelta.

The following is a comparison of sdelta and xdelta 
on a very large dictionary file and message file.
The numbers with current sdelta may vary as it has 
undergone some optimizations to make it more suitable
for matching against dissimilar dictionary files.

bytes		file name
197888000	linux-2.6.7.tar		<- dictionary file
200847360	linux-2.6.8.1.tar	<- message    file

2060961		linux-2.6.7-linux-2.6.8.1.tar.sdelta.bz2
3190401		linux-2.6.7-linux-2.6.8.1.tar.xdelta.bz2

In the above comparison both the xdelta and sdelta patch
files were generated with compression disabled.
Finally, each delta file was compressed with bzip2.

In this example a 3 megabyte xdelta is far faster 
to download than a 34 megabyte linux-2.6.8.1.tar.bz2
However the 2 megabyte bzip2 compressed sdelta 
file is 1/3 faster to transfer and 1/3 smaller 
to download than the xdelta for the above files.

Replicating the above example can yield:

6.9M linux-2.6.7-linux-2.6.8.1.tar.xdelta  or
3.2M linux-2.6.7-linux-2.6.8.1.tar.xdelta  if gzip compression enabled.
and
13M linux-2.6.7-linux-2.6.8.1.tar.sdelta

Now type:
bzip2 -9 linux-2.6.7-linux-2.6.8.1.tar.sdelta
Like sorcery the 13M file shrinks down to 2M.

13M  sdelta bzip2 compressed becomes 2M
6.9M xdelta bzip2 compressed becomes 3M

Do the numbers seem counter intuitive?
Fortunately, the explanation is not complex.

xdelta and sdelta are both delta generators 
and also both dictionary compressors.
The uncompressed xdelta is smaller than 
the uncompressed sdelta because the xdelta 
contains more references to matching data 
than the sdelta does.

References to matching data is a form of compression.
Therefore, the reference portion of a delta is nearly
incompressible by advanced compression program such as bzip2.
The bzip2 of the sdelta patch compressed better 
because it has less reference data and more 
compressible data than the xdelta patch file.

The compressed sdelta is smaller than the compressed xdelta,
because the quality of the fewer matches in the sdelta is 
better than the quality of the many matches in the xdelta.
Quality over quantity prevails.

Here is how sdelta and xdelta compare on medium and small source tarballs:

85M  glibc-2.3.2.tar			<- message file
1.1M glibc-2.3.2-2.3.3.tar.sdelta.bz2	<- 1.1M bzip2 sdelta patch file
1.8M glibc-2.3.2-2.3.3.tar.xdelta.bz2	<- 1.8M bzip2 xdelta patch file

3.0M   polypaudio-0.4.tar			<- message file
34104  polypaudio-0.3-0.4.tar.sdelta.bz2	<- 25% smaller bzip2 sdelta
45088  polypaudio-0.3-0.4.tar.xdelta.bz2	<- larger bzip2 xdelta


	Is something this good free?

Certainly sdelta is free!
The license that governs the use, modification 
and redistribution of sdelta is enclosed as LICENSE.

sdelta would not be as good without other freely available 
high quality software like gzip and bzip2 compressors,
and of course rsync which inspired xdelta which inspired sdelta.

Although sdelta is free please consider yourself morally obligated 
to help the free software revolution by contributing time effort or 
your own high quality free software that you design and implement.

sdelta is implemented to compile and run on 
glibc Linux boxes of any endian architecture.
sdelta comes with no warranty.
If it breaks then you can keep the pieces.
Provide bug reports if you want fixes.
Send them to [email protected].
Send improvements and code optimization as 
well if you want it to go faster for everyone.

sdelta was written without copying source code from other sources.
sdelta is not a fork, clone nor copy, of other software project.
Although similar in goal to xdelta the aproach of using dynamically
sized line aligned blocks is a radical departure from xdelta's fixed
block size.

My thanks to Joshua MacDonald, because without xdelta 
I might not have decided to design and implement sdelta.
sdelta is not a better xdelta.
However it does generate better deltas for source tarballs.
xdelta is still the king of delta generators.  Hail to the king.
sdelta is merely the duke of text delta generators.