5 Derivative-Based Stochastic Search

We begin our discussion of adaptive learning methods in stochastic optimization by addressing problems where we have access to derivatives (or gradients, if x is a vector) of our function F(x, W). It is common to start with the asymptotic form of our basic stochastic optimization problem

(5.1)

but soon we are going to shift attention to finding the best algorithm (or policy) for finding the best solution within a finite budget. We are going to show that with any adaptive learning algorithm, we can define a state Sⁿ that captures what we know after n iterations. We can represent any algorithm as a “policy” X^π(Sⁿ) which tells us the next point xⁿ = X^π(Sⁿ) given what we know, Sⁿ, after n iterations. Eventually we complete our budget of N iterations, and produce a solution that we call x^π,N to indicate that the solution was found with policy (algorithm) π after N iterations.

After we choose xⁿ, we observe a random variable Wⁿ⁺¹ that is not known when we chose xⁿ. We then evaluate the performance through a function F(xⁿ, Wⁿ⁺¹) which can serve as a placeholder for a number of settings, including the results of a computer simulation, how a product works in the market, the response of a patient to medication, or the strength of a material produced in a lab. The initial state S⁰ might contain fixed parameters (say the boiling point of a material), ...

Get Reinforcement Learning and Stochastic Optimization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Reinforcement Learning and Stochastic Optimization by Warren B. Powell

5 Derivative-Based Stochastic Search

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly