Chapter 4
EM Optimization Methods
The expectation–maximization (EM) algorithm is an iterative optimization strategy motivated by a notion of missingness and by consideration of the conditional distribution of what is missing given what is observed. The strategy's statistical foundations and effectiveness in a variety of statistical problems were shown in a seminal paper by Dempster, Laird, and Rubin [150]. Other references on EM and related methods include [409, 413, 449, 456, 625]. The popularity of the EM algorithm stems from how simple it can be to implement and how reliably it can find the global optimum through stable, uphill steps.
In a frequentist setting, we may conceive of observed data generated from random variables X along with missing or unobserved data from random variables Z. We envision complete data generated from Y = (X, Z). Given observed data x, we wish to maximize a likelihood L(θ|x). Often it will be difficult to work with this likelihood and easier to work with the densities of Y|θ and Z|(x, θ). The EM algorithm sidesteps direct consideration of L(θ|x) by working with these easier densities.
In a Bayesian application, interest often focuses on estimating the mode of a posterior distribution f(θ|x). Again, optimization can sometimes be simplified by consideration of unobserved random variables ψ in addition to the parameters of interest, θ.
The missing data may not truly be missing: They may be only a conceptual ploy that simplifies the problem. In this case, ...