jgz - Usage

Classes

The Inflater and Deflater codes provide raw DEFLATE algorithm implementations. They are independant of each other. Zlib and gzip file formats are supported with wrapper streams.

In order to use jgz for the implementation of SSH or SSL, you may either use the raw classes, or the zlib file wrappers (ZlibInputStream and ZlibOutputStream). Either way, you will need to read data on a block basis, i.e. "partial" mode with the ZlibInputStream class. If uncompressing records without blocking, then some care must be taken about where to stop processing, because the inflater code tends to decode whole blocks. Please refer to the documentation on the zlib flush modes for more details.

When deflating, the DeflaterStateMachine class can be used if you prefer a non-blocking array-based API, instead of the stream-based API that Deflater uses. DeflaterStateMachine is closer to what Java provides in java.util.zip.Deflater. In particular, this class may produce either raw DEFLATE blocks, or data in zlib format with header and final footer (the 32-bit checksum). A similar API for the inflater code is planned for the next version of jgz.

Compression Levels

Jgz knows four compression levels, when deflating:

HUFF (1): do not apply LZ77 compression; use only Huffman codes. This is the fastest mode, but the compression ratio is rarely good.
SPEED (2): try limited LZ77 compression. This mode favours processing speed over compression ratio.
MEDIUM (3): average LZ77 compression. This mode is a tradeoff between processing speed and compression ratio (slightly biased towards compression). This mode is the default, and is mostly equivalent to zlib/gzip default compression level.
COMPACT (4): favour compression ratio over processing speed.

For most usages, MEDIUM is fine, although SPEED may be preferred. Remember that nothing replaces a benchmark using actual data. Note that compression level affects the output of the deflater, but that output is always compatible with any compliant inflater. This also explains whay the inflater has no "compression level" setting.

Wherever a compression level can be specified, using the value 0 selects the default compression level (currently MEDIUM).

Buffering

The Inflater class accesses the compressed data by reading the provided stream byte by byte (except when decoding an uncompressed block, in which case chunk reads will be performed). It follows that the stream which provides the compressed data should use some kind of buffering. The same applies to the wrappers around the inflater code (GZipInputStream and ZlibInputStream). Please note that the inflater will never read a single byte more than what is strictly necessary.

The Deflater class outputs compressed data by bursts; this kind of buffering is inherent to the DEFLATE algorithm (a block of symbols must be known before deciding whether it is worth using dynamic Huffman codes). Moreover, the Deflater class internally features a 4 kBytes buffer so that only chunk write calls are issued on the transport stream. Hence, the transport stream needs not be buffered.

The DeflaterStateMachine can also be used instead of Deflater, in order to get data in arrays instead of streams; that API is most useful to applications which operate asynchronously (i.e. where the deflating process must not block and where callbacks are inappropriate). Internally, DeflaterStateMachine uses a Deflater instance.

The deflating process can be flushed, using the partial, sync or full flush modes. Which mode shall be used depends on the outer protocol; see this page for more information on this issue.

Memory Footprint

Each Inflater class allocates a 32-kilobyte buffer (that is the DEFLATE "sliding window"). A few more kilobytes (about a dozen) are allocated for the processing of Huffman trees; Inflater will allocate new arrays for the Huffman trees for each DEFLATE block, which means that some garbage collector activity is expected when inflating a long stream. Keeping the buffers around and reusing them makes the code a bit more complex, but does not seem to improve performance; one could argue that by allocating the arrays dynamically, jgz allows them to be released while processing a block (these arrays are transient and not needed during the bulk of the inflation job).

The Deflater class allocates about 400 kilobytes of data for each instance. These include 256 kilobytes of data for the sliding window and management structures (hash table for sequences), and 128 kilobytes for the internal symbol buffering. By using a smaller window, memory footprint can be reduced; e.g., using a 14-bit window instead of the 15-bit standard window should save 128 kilobytes (reducing the window also speeds up a bit the compression, but it also reduces the compression ratio; the produced streams remain compatible with the compliant inflaters).

Dependencies

For size constrained applications, some classes may be omitted. We list here the dependencies between classes.

Raw inflater code uses Inflater, JGZException and Trees.
Uncompression from gzip files uses Inflater, JGZException, Trees, GZipDecoder, GZipInputStream, CRC and CRCInputStream.
Uncompression from zlib files uses Inflater, JGZException, Trees, ZlibInputStream and Adler32.
Raw deflater code uses Deflater and Trees.
Compression into gzip files uses Deflater, Trees, GZipOutputStream and CRC.
Compression into zlib files (without the state machine API) uses Deflater, Trees, ZlibOutputStream and Adler32.
The state machine API uses Deflater, Trees, DeflaterStateMachine and Adler32. Note that the state machine may produce zlib files, independantly of ZlibOutputStream.

Caveats

The COMPACT level does not live up to its promises. Occasionaly, it results in larger files than the MEDIUM level.

The inflater code is a bit tricky to use with regards to flush modes: the application must use block-based reading (i.e. the "partial" mode of ZlibInputStream) and must refrain from calling the inflater when at most one input byte remain unprocessed. A state machine API is planned for the next version (this requires some considerable internal rewriting of the inflater code).