Learning to Optimally Stop Diffusion Processes
We study optimal stopping for diffusion processes with unknown model primitives within the continuous-time reinforcement learning (RL) framework developed by Wang et al. (2020). By penalizing the corresponding variational inequality formulation, we transform the stopping problem into a stochastic optimal control problem with two actions. We then randomize controls into Bernoulli distributions and add an entropy regularizer to encourage exploration. We derive a semi-analytical optimal Bernoulli distribution, based on which we devise RL algorithms using the martingale approach established in Jia and Zhou (2022a), and prove a policy improvement theorem. Finally, we demonstrate the effectiveness of the algorithms in examples of pricing finite-horizon American put options, and show that both the offline and online algorithms achieve high accuracy in learning the value functions and characterizing the associated free boundaries. Joint work with Min Dai, Yu Sun and Zuo Quan Xu.