shorten -x [-hl] [ -a #bytes] [-d #bytes] [shortened-file [waveform-file]]
shorten [ -s | -S<name> ] shortened-file
shorten reduces the size of waveform files (such as audio) using Huffman coding of prediction residuals and optional additional quantisation. In lossless mode the amount of compression obtained depends on the nature of the waveform. Those composing of low frequencies and low amplitudes give the best compression, which may be 2:1 or better. Lossy compression operates by specifying a minimum acceptable segmental signal to noise ratio or a maximum bit rate. Lossy compression operates by zeroing the lower order bits of the waveform, so retaining waveform shape.
If both file names are specified then these are used as the input and output files. The first file name can be replaced by "-" to read from standard input and likewise the second filename can be replaced by "-" to write to standard output. Under UNIX, if only one file name is specified, then that name is used for input and the output file name is generated by adding the suffix ".shn" on compression and removing the ".shn" suffix on decompression. In these cases the input file is removed on completion. The use of automatic file name generation is not currently supported under DOS. If no file names are specified, shorten reads from standard input and writes to standard output. Whenever possible, the output file inherits the permissions, owner, group, access and modification times of the input file.
From release 2.3 the RIFF WAVE (Microsoft .wav) file type is the default. These files contain enough information to set most of the switches presented below, so effective operation is obtained just by setting the desired level of compression (-n or -r switch).
Decompression time is normally about twice that of the default polynomial interpolation. For version 0 and 1, compression time is linear in the specified maximum order as all lower values are searched for the greatest expected compression (the number of bits required to transmit the prediction residual is monotonically decreasing with prediction order, but transmitting each filter coefficient requires about 7 bits). For version 2 and above, the search is started at zero order and terminated when the last two prediction orders give a larger expected bit rate than the minimum found to date. This is a reasonable strategy for many real world signals - you may revert back to the exhaustive algorithm by setting -v1 to check that this works for your signal type.
The simple types are listed first and have an initial s or u for signed or unsigned data, followed by 8 or 16 as the number of bits per sample. No further extension means the data is in the natural byte order, a trailing x specifies byte swapped data, hl explicitly states the byte order as high byte followed by low byte and lh the converse. Hence s16 means signed 16 bit integers in the natural byte order (like C would fwrite() shorts).
ulaw is the natural file type of ulaw encoded files (such as the default sun .au files) and alaw is a similar byte-packed scheme. Specific optimisations are applied to ulaw and alaw files. If lossless compression is specified with ulaw files then a check is made that the whole dynamic range is used (useful for files recorded on a SparcStation with the volume set too high). Lossless coding of both file types uses an internal format with a monotonic mapping to linear. If lossy compression is specified then the data is internally converted to linear. The lossy option "-r4" has been observed to give little degradation and provides 2:1 compression.
With the types listed above you should explicitly set the number of channels (if not mono) with -c and if the file contains a header the size should be specified with -a. This is most important for lossy compression which will lead to data corruption if a file header is inadvertently lossy coded.
Finally, as of version 2.3, the file type may be specified as wav (the default). In this case the file to be compressed is interogated for the specific data type (chosen from the above) and the number of channels to be used. The header length alignment (-a flag) is also automatic so lossless compression requires no switches to be set and lossy compression requires only that the compression level be set with -n or -r.
shorten works by blocking the signal, making a model of each block in order to remove temporal redundancy, then Huffman coding the quantised prediction residual.
Four functions are computed, corresponding to the signal, difference signal, second and third order differences. The one with the lowest variance is coded. The variance is measured by summing absolute values for speed and to avoid overflow.
It is assumed the signal has the Laplacian probability density function of exp(-abs(x)). There is a computationally efficient way of mapping this density to Huffman codes, The code is in four parts: a run of zeros; a bounding one; a fixed number of bits mantissa; and the sign bit. The number of leading zeros gives the offset from zero. Some examples for a 2 bit mantissa:
Value zeros stopbit mantissa signbit total code 0 1 00 0 1000 1 1 01 0 1010 2 1 10 0 1010 4 0 1 00 0 01000 7 0 1 11 0 01110 8 00 1 00 0 001000 -1 1 00 1 1001 -2 1 01 1 1011 -7 0 1 10 1 01101
Note that negative numbers are offset by one as there is no need to have two zero codes. The technical report CUED/F-INFENG/TR.156 included with the shorten distribution as files tr154.tex and tr154.ps contains bugs in this format description and is superceeded by this man page.
Shorten may be used embedded within other programs. shorten is a function call implemented in the file shorten.c. The file main.c provides a wrapper for stand alone operation. A simple example of ebedded operation can be found in the file embedded.c. Full windows DLL operation is provided in the windll subdirectory.
Exit status is normally 0. A warning is issued if the file is not properly aligned, i.e. a whole number of records could not be read at the end of the file.
No check is made for increasing file size, but valid waveform files generally achieve some compression. Even compressing a file of random bytes (which represents the worst case waveform file) only results in a small increase in the file length (about 6% for 8 bit data and 3% for 16 bit data). There is one condition that is know to be problematic, that is the lossy compression of unsigned data without mean estimation - large file sizes may result if the mean is far from the middle range value. For these files the value of the -m switch should be non-zero, as it is by default in format version 2.
There is no provision for different channels containing different data types. Normally, this is not a restriction, but it does mean that if lossy coding is selected for the ulaw type, then all channels use lossy coding.
The technical report CUED/F-INFENG/TR.156 (included in the shorten distribution) report contains errors in the bitfield format description and is superceeded by this document.
See the file "change.log" for a history of bug fixes.
Please mail me immediately at the address below if you do find a bug.
Shorten is available for non-commercial use without fee. See the LICENSE file for the formal copying and usage restrictions. For supported versions please see http://www.softsound.com/Shorten.html and for commercial use please contact shorten@softsound.com