4.5 ANALYSIS-BY-SYNTHESIS LINEAR PREDICTION
In closed-loop source-system coders (Figure 4.7), the excitation source is determined by closed-loop or analysis-by-synthesis (A-by-S) optimization. The optimization process determines an excitation sequence that minimizes the perceptually weighted mean-square-error (MSE) between the input speech and reconstructed speech [Atal82b] [Sing84] [Schr85]. The closed-loop LP combines the spectral modeling properties of vocoders with the waveform matching attributes of waveform coders; and, hence, the A-by-S LP coders are also called hybrid LP coders. The system consists of a short-term LP synthesis filter, 1/A(z), and a LTP synthesis filter, 1/AL(z), shown in Figure 4.7. The perceptual weighting filter (PWF), W(z), shapes the error such that quantization noise is masked by the high-energy formants. The PWF is given by
where γ1 and γ2 are the adaptive weights and L is the order of the linear predictor. Typically, γ1 ranges from 0.94 to 0.98, and γ2 varies between 0.4 and 0.7, depending upon the tilt or the flatness characteristics associated with the LPC spectral envelope [Sala98] [Bess02]. The role of W(z) is to de-emphasize the error energy in the formant regions [Schr79]. This de-emphasis strategy is based on the fact that quantization noise in the formant regions is partially masked by speech. From Figure 4.7, note that a gain factor, g, scales ...
Get Audio Signal Processing and Coding now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.