Either library (or both) can be included in (or linked to) applications.
Files:
The next figure below shows the basic structure of a compressed block or segment.
For small or medium sized files, there may be only one segment needed.
The magic numbers help check that the file is the right type, and that
indexing and contents are as expected/needed.
SCZ works on an iterative basis. After the first two magic numbers (101,98),
the iteration-count for the block is stored. This says how many times the block
was compressed, and how many times to re-apply decompression. Next is 24-bits (3-bytes)
to hold the size of the block data. This avoids reserving a special end-of-block character.
Next is the forcing character. This must be inserted ahead of any special symbols to
indicate the next character is to be taken literally; not interpreted. (Like an
escape character.) Next is the symbol table, starting with the table's size (count).
Each consists of a marker character that is used to replace the following pair (or phrase).
The table is followed by a magic barrier character (91) to assure we are beginning
the compressed data segment at the right point. All this is followed by the
segment's original checksum and a continuation or end marker.
The next figure shows how the basic segment structure can be chained.
This permits either large files to be processed in sections, or streaming
for dynamic processes.
Notes:
Therefore, an integral 'blocking' capability was added to the SCZ format.
It enables SCZ routines to process large files or buffers in smaller segments.
This limits the dynamic memory required to a constant, regardless of the
file size. It also provides more efficient compression, because replacement
symbols can better match the redundancy statistics of smaller portions of
large files. And it enables streaming operations - the ability to
compress/decompress a little bit at a time, continuously, if need be.
Also, if bytes of data in a compressed file is corrupted, it may allow
partial recovery of the good blocks.
The 'blocking-size' is arbitrary. It defaults to 4 MB, but can be set
smaller or larger by changing the value of the "sczbuflen" variable.
Reducing block size yields greater compression, but consumes more time.
By testing SCZ routines with thousands of different files of various
sizes, we can gain confidence in SCZ's correctness and efficiency.
The regression tests can be quickly re-run whenever any improvements
to SCZ are proposed, to verify that it continues to work properly.
The tests also provide additional examples for how to call the compression
routines.
Return to SCZ-Main Page