Note that BladeEnc, 8hz-mp3, CDEX and LAME 2.1 all produce identical results. Only the BladeEnc result is given.
NOTE 6/99: This problem is fixed with new mid/side switch added to LAME 3.12!
mstest, Mide/Side stereo encoding test sample (about 5 seconds)
The FhG encoder does an even better job on this sample, mostly because it detects some of the later castanets. They are muffled by other sounds and GPSYCHO fails to recognize them as needing short blocks. Latter on in the sample, the castanets come fast and furious, and even the FhG encoder can not maintain enough bits in the bit reservoir. VBR would be great in this situation. It is very easy to put into an encoder, but I don't have a player to debug it with.
Normally you have to perform listening tests to determine the quality of an mp3 encoding. You can not generally say anything about the quality by looking at the original and encoded pcm signal. Pre-echo problems like in castanets.wav are an exception to this. In a bad encoding, the sharp attack of the castanets will create noise that is heard before the actual castanets. This flaw is very visible in the encoded pcm signal, and is shown for several different encoders in Screenshots.
With the castents.wav file it's easy to try out new short block detection schemes. You dont have to rely on listening tests since the pre-echo is so easy to see in the output pcm data. Just modify the graphical interface display the new criterion and then go through castanets.wav frame by frame and see if it is triggered in the correct spots. For an interesting comparison, run lame with -g (the graphical frame analyzer) on MP3 files produced by other encoders to see how well they do.
Castanets, FhG reference sample (about 5 seconds)
This song contains a lot of very tonal piano music for which even the ISO encoder usually does ok. But in certain situations it produces very noticeable distortion in the piano notes. (Particularly in frames 50-70). GPSYCHO fixes this mostly due to the improved outer_loop in the bit allocation subroutine. This sample also has some attacks (drums) that are greatly improved with GPSYCHO. I cannot detect a difference between GPSYCHO and FhG for this sample.
Elsewhere, Sara McLachlan (5 second sample)
FhG does great. They seem to have excellent pre-echo detection. I would love to know what their algorithm is based on.
Note 5/99: LAME 3.05 has a much improved pre-echo detection algiorithm, and fixes most of the above problems!
One thing I would like to try is switching to a 768 FFT instead of 1024. The FFT is used to compute the energies in the 576 sample (1 granule) window. With an FFT of almost twice the size of the granule, the higher frequency energies within the granule are easily contaminated by data from outside the granule. Looking at the spectrum with MP3x, you can see that the signal is dominated by higher than normal frequencies which change substantially from granule to granule.
For example, a 1kHz signal represents 44 sample points. 13 wavelengths will fit in one granule. Estimating the energy in the 1kHz mode with a 1024 FFT will use the 13 wavelengths within the granule plus 5 wavelengths on either side of the granule. This is fine if the signal is very tonal, meaning the energy does not change much from granule to granule, but this is not the case for the 1kHz signal in applaud.wav. A 768 FFT would only consider 2 extra wavelengths on each side of the granule, and they would be mostly in the taper of the Hann window.
Another possibility would be to try and estimate the energy from the 3 overlapping 256 FFTs used to compute the high frequency tonality.
If anyone has other suggestions, let me know!
Information on the applaud.wav test sample: