Based on Bosi et al. "ISO/IEC MPEG-2 AAC", J. Audio Eng. Soc. 45 (1997) p 789-814.
Another good complement to the ISO documentation is: Brandenburg & Stoll, "ISO-MPEG-1 Audio: A Generic Standard for Coding of High-Quality Digital Audio", J. Audio Eng. Soc 42 (1994) p 780-792.
The goal of the outer_loop routine in MP3 is to find the combination of scalefactors within each scalefactor band which produce the least amount of audible distortion. Audible distortion is distortion in a scalefactor band which exceeds the masking thresholds (computed by the psycho-acoustic model)
Pseudo-Code:
initialize all scalefactors to 0.
compute initial q = quantization step size (bin_search_stepsize)
divide & conquer algorithm to find approximate
value of q
outer_loop:
do {
compute quantization with given scalefactors and not
too many bits
(call inner_loop)
calc_noise():
compute distortion within each scalefactor
band
compare distortion to allowed distortion
(from psy-model)
over = number of scalefactor bands
where distortion > allowed_distortion
if this quantization is the best one found so far,
save it
if over=0 we are done, exit.
otherwise do ONE of the following (not both)
turn pre-emphasis on
amplify scalefactors for bands with
distortion
} while over<>0 or !(all scalefactors set to their max)
Restore BEST quantization
Whenever a scalefactor band is amplified, it will force the next quantization to use more bits for that band. This will result in more bits used to encode the MDCT coefficients in that band, and thus less quantization error. That is why bands with audible distortion are amplified. However, it will also result in less bits for the unamplified bands. But these bands had a quantization error less than the allowed masking, so hopefully they can tolerate a little more noise. The whole procedure is designed to allocate the bits to the bands which need them the most.
When the loop is done, if we found a quantization with count=0, everything is great. Otherwise, we have to choose the best quantization that we found. The ISO model chooses the last quantization tried during outer_loop. This is strange because this is usually one of the worst. The MPEG2 paper makes the obvious point that after trying out all the different combinations, you should choose the BEST one, not the LAST one! GPSYCHO defines the BEST quantization as the one with the smallest value of "over". Among quantizations with the same value of "over", LAME takes the one with the largest scalefactors. This will be the quantization with the best resolution in the bands where there is audible distortion.
If you have ideas for a better way to define the BEST quantization, let me know!
Gabriel Bouvigne makes the following point: Which do you think is worse: