CHAPTER 31
PITCH DETECTION
31.1 INTRODUCTION
In some of the previous chapters, we have stressed the model of speech and music production as consisting of one or more excitations that drive a time-variable filter. In this chapter we focus on the excitation model, and in particular on the extraction of pitch frequency. The time-variable filter that results in the spectral envelope can be estimated in different ways, including filter banks, cepstra, and linear prediction (see Chapters 19, 20, and 21), as well as combinations of these approaches (see Chapter 22).
Modeling of the excitation function of speech requires paying particular attention to the following components: (a) the periodic or nearly periodic opening and closing of the glottis during voicing; (b) the shape of the glottal pressure pulse; (c) the position in the vocal system of the constriction that creates turbulent flow during unvoiced sound; (d) the nature of the excitation function during stop consonant articulation; (e) how voicing and turbulence combine during articulation of the voiced fricative sounds; and (f) possible nonlinear interactions between excitation and acoustic tube response.
In many ways, accurate modeling of the excitation parameters is more complex than modeling of the time-varying linear filter that we use to represent the vocal tract. Channel vocoder researchers in the 1950s must have been somewhat ...
Get Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.