A Free Audio Compression Format?
Based on what I've learned working with LAME and
GPSYCHO,
I believe it would not be too difficult to develop an independent audio
codec of slightly better quality than LAME, (and thus comparible to the
best commercial MP3 codecs). Yes, it wont be as good as AAC,
but think of this: how many people would use a proprietary compression
code which was slightly better than gzip?
The following is an outline of such an audio codec. It is based
on general published ideas which form the basis of several encoders (MP3,
MPEG4-AAC, AT&T PAC). It includes most of the ideas that
were used by the MP3 format, but removes many of the components in MP3
which are inherited from layer I and layer II. I have also added
some newer ideas that were utilized by AAC and PAC. Some
of the more sophisticated ideas such as temporal noise shaping and predictive
coding are not included.
The problem with all of this: many of these fundamental ideas
are patented in countries which allow patents on algorithms. There
is still a difference between ideas and algorithms, so it may be possible
to implement this codec using different algorithms for the same ideas.
It will require a significant amount of legal work to make this determination.
If your beliefs do not coincide with the patent holder's beliefs, you could
be sued and the courts will decide. If you dont have the money for
such a law suit, then that is the end of the project!
Just a cursiory patent search will yield dozens of patents on every
aspect of audio compression. Below I have referenced some of these
patents along with my uninformed interpretation of what they claim.
Frame/Window types
-
1024 and 128 (for pre-echo) sample windows. MP3 uses 576 & 192
sample windows. AAC uses 1024 and 128 sample windows. (Brandenburg
& Stoll 1994, Bosi et al. 1997 in References)
-
Spectral coefficients computed from overlapping MDCT coefficients.
(lossless). MP3 and AAC apply the MDCT only after first splitting
the signal into frequency bands with windowed filterbanks.
-
pre-echo detection from the GPSYCHO
algorithm. The GPSYCHO pre-echo detection algorithm is truely
original, although it is such a simple concept that I'm sure someone has
patented it.
The very concept of using spectral transforms applied to frames of PCM
samples seems to be patented (US5579430). But I believe
spectral transforms (or filterbanks) must be used because psycho acoustic
information is given in terms of spectral coefficients (the frequency domain).
The majority of audio compression comes from allocating bits between different
frequency bands based on psycho acoustic information.
The concept of window switching to reduce pre-echo effects is patented
in US5285498.
Critical Bands
-
Group coefficients in critical bands. MP3 uses 21 for long windows,
12 for short. AAC uses 49 for long windows, 14 for short.
-
Allow option of mid/side encoding for
each critical band. MP3 does not allow mid/side encoding on
a band by band basis. AT&T PAC does. (Johnston &
Ferrera 1992 References)
Critical bands are a way to group frequency bands which better mimics the
response of the human ear. The concept is old, but there
may be patents on the use of critical bands for audio compression.
The concept of mid/side encoding is patented in US5481614.
Quantization of MDCT coefficients
-
Associated to each critical band is a scale factor. The larger the
scale factor, the more bits allocated to this critical bands.
-
Truncate MDCT coefficients *scalefactor to integers. This is all
that is meant by Quantization.
-
Choose scale factors so quantization distortion in each critical band is
less than the masking computed by the psycho-acoustic model.
-
If more compression is desired (with some distortion) choose scale factors
with GPSYCHO algorithm. Compression
can be controlled to produce a given bitrate, or given quality.
The use of scale factors to control the allocation of bits between scale
factor bands is patented. Even worse, just the concept of allocating
bits among critical bands based on any set of external requirements is
patented. (US5579430)
Lossless compression of quantized MDCT coefficients
-
Some type of lossless compression and encoding of quantized data.
MP3 uses Huffman coding with precomputed tables each assigned a unique
code.
The type of Huffman coding used in MP3 is patented (US5579430).
Are there other types of Huffman coding which we could use?
Is the concept of precomputed tables patentable? Or are just the
tables themselves patented? A version of the algorith in gzip, optimized
for audio frames would probably be the best just.
But just the very fact of using optimized encoding is claimed to be
patented! (US5579430).
Psycho-acoustic model (output used during quantization step)
-
Masking given by a linear function expressed in critical bands
-
Strength of masking given from tonality of signal
-
Tonality estimated by a measure of the predictability of the signal.
-
Johnson (1988) and Brandenburg & Johnston (1990) References
The algebraic formulas for these quantities are in the published literature.
The tonality formula is patented in US5040217.
Some of this has already been done. Check out this
project.