Ready to encode.
Efficacy of the Compressor: The compression ratio (full size divided by compressed size) of any one compression is displayed in the output, and can range from .33 (for a single character) to 8 (for an infinitely-long repetition of a single character). This utility performs optimal character-wise compression - that is, for any set of randomly generated characters (even with different weights for each character), a Huffman Coding will perform maximal compression. However, for non-random strings of characters, a Huffman Coding will not perform maximal compression, and is thus only one part of modern text/binary compression algorithms.
Self-Containment: Technically speaking, a Huffman Coding is only a set of rules for compressing data. Likewise, it is possible to have encoded data without the coding itself, which can be useful in cases where all applicable data is relatively similar. This implementation, however, generates a new coding for each input, and the compressed output includes both the coding and the encoded data. This way, the output is entirely self-contained, and can be decompressed using this tool at any time, on any machine.
Encoding Protocol: To achieve the self-containment mentioned above, this utility has a protocol for returning the Huffman Coding and compressed data as a single binary string. This binary string starts with padding zeros (as necessary to fill out the output style), and then a one, which signals that the coding will begin on the next bit. At this point, the root node of the Huffman Tree is created, and a traversal (left to right) is begun there. Thereafter, if the next bit is a one, the next child becomes a leaf node which contains the next 8 bits (which are then skipped) as data; otherwise, it is a branch, and the traversal moves down to it until it is completed. Once the tree is complete, the code for any byte can be found by taking the path from the tree's root to the data's leaf (i.e. left right right) and making each 'left' a zero and each 'right' a one. From there, a dictionary describing the relationship between bytes and codes is created, and then used to encode the input bytes; this encoding is appended to the coding itself.