The program applies the Huffman Coding algorithm to compress text files encoded with ASCII. It obtains the frequency of each character found in the input file, and generates a priority queue in a way that the most frequent characters are closer to the root node than the characters that appear less frequently in the file. As a result, a table that maps characters to their respective binary codes (0 for left nodes, and 1 for right nodes) is generated.
Huffman coding reduces the original file size by writing bits based on the relative character frequency, so that the most frequent characters will be written with fewer bits (as opposed to ASCII encoding where any character represents 1 byte of memory). The program also stores the Huffman tree generated when compressing a file as a header in the output (compressed) file, using pre-order traversal for maximum space-savings. This facilitates the process of unzipping any file that was first compressed with this program, since the Huffman tree can be recreated, and the sequence of bits can be turned into characters by constantly traversing the tree until a leaf node is reached (the characters are stored in leaf-nodes).
Java 7 or above.
The program compresses .txt files, and the output files should also have a .txt extension.
To compress a file:
java -jar Huffman.jar compress [inputfile.txt] [compressedfile.txt]
- Example:
java -jar Huffman.jar compress PrideAndPrejudice.txt CompressedFile.txt
To decompress a file generated by this program:
java -jar Huffman.jar decompress [compressedfile.txt] [originalfile.txt]
- Example:
java -jar Huffman.jar decompress CompressedFile.txt OriginalFile.txt
Documentation is offered under docs/index.html
Or you can see it here
- Andres Beltran - B.S. Computer Science candidate. CWRU 2021
This project is licensed under the MIT License - see the LICENSE file for details