5 Derivative-Based Stochastic Search

We begin our discussion of adaptive learning methods in stochastic optimization by addressing problems where we have access to derivatives (or gradients, if x is a vector) of our function F(x, W). It is common to start with the asymptotic form of our basic stochastic optimization problem

table attributes columnalign right center left end attributes row cell stack text max end text with x element of calligraphic X below text end text double-struck E left curly bracket F left parenthesis x comma W right parenthesis vertical line S to the power of 0 right curly bracket comma end cell end table    (5.1)

but soon we are going to shift attention to finding the best algorithm (or policy) for finding the best solution within a finite budget. We are going to show that with any adaptive learning algorithm, we can define a state Sn that captures what we know after n iterations. We can represent any algorithm as a “policy” Xπ(Sn) which tells us the next point xn = Xπ(Sn) given what we know, Sn, after n iterations. Eventually we complete our budget of N iterations, and produce a solution that we call xπ,N to indicate that the solution was found with policy (algorithm) π after N iterations.

After we choose xn, we observe a random variable Wn+1 that is not known when we chose xn. We then evaluate the performance through a function F(xn, Wn+1) which can serve as a placeholder for a number of settings, including the results of a computer simulation, how a product works in the market, the response of a patient to medication, or the strength of a material produced in a lab. The initial state S0 might contain fixed parameters (say the boiling point of a material), ...

Get Reinforcement Learning and Stochastic Optimization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.