Towards on-sky adaptive optics control using reinforcement learning
Authors:
J. Nousiainen,
C. Rajani,
M. Kasper,
T. Helin,
S. Y. Haffert,
C. VĂ©rinaud,
J. R. Males,
K. Van Gorkom,
L. M. Close,
J. D. Long,
A. D. Hedglen,
O. Guyon,
L. Schatz,
M. Kautz,
J. Lumbres,
A. Rodack,
J. M. Knight,
K. Miller
Abstract:
The direct imaging of potentially habitable Exoplanets is one prime science case for the next generation of high contrast imaging instruments on ground-based extremely large telescopes. To reach this demanding science goal, the instruments are equipped with eXtreme Adaptive Optics (XAO) systems which will control thousands of actuators at a framerate of kilohertz to several kilohertz. Most of the…
▽ More
The direct imaging of potentially habitable Exoplanets is one prime science case for the next generation of high contrast imaging instruments on ground-based extremely large telescopes. To reach this demanding science goal, the instruments are equipped with eXtreme Adaptive Optics (XAO) systems which will control thousands of actuators at a framerate of kilohertz to several kilohertz. Most of the habitable exoplanets are located at small angular separations from their host stars, where the current XAO systems' control laws leave strong residuals.Current AO control strategies like static matrix-based wavefront reconstruction and integrator control suffer from temporal delay error and are sensitive to mis-registration, i.e., to dynamic variations of the control system geometry. We aim to produce control methods that cope with these limitations, provide a significantly improved AO correction and, therefore, reduce the residual flux in the coronagraphic point spread function.
We extend previous work in Reinforcement Learning for AO. The improved method, called PO4AO, learns a dynamics model and optimizes a control neural network, called a policy. We introduce the method and study it through numerical simulations of XAO with Pyramid wavefront sensing for the 8-m and 40-m telescope aperture cases. We further implemented PO4AO and carried out experiments in a laboratory environment using MagAO-X at the Steward laboratory. PO4AO provides the desired performance by improving the coronagraphic contrast in numerical simulations by factors 3-5 within the control region of DM and Pyramid WFS, in simulation and in the laboratory. The presented method is also quick to train, i.e., on timescales of typically 5-10 seconds, and the inference time is sufficiently small (< ms) to be used in real-time control for XAO with currently available hardware even for extremely large telescopes.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
Self-optimizing adaptive optics control with Reinforcement Learning for high-contrast imaging
Authors:
Rico Landman,
Sebastiaan Y. Haffert,
Vikram M. Radhakrishnan,
Christoph U. Keller
Abstract:
Current and future high-contrast imaging instruments require extreme adaptive optics (XAO) systems to reach contrasts necessary to directly image exoplanets. Telescope vibrations and the temporal error induced by the latency of the control loop limit the performance of these systems. One way to reduce these effects is to use predictive control. We describe how model-free Reinforcement Learning can…
▽ More
Current and future high-contrast imaging instruments require extreme adaptive optics (XAO) systems to reach contrasts necessary to directly image exoplanets. Telescope vibrations and the temporal error induced by the latency of the control loop limit the performance of these systems. One way to reduce these effects is to use predictive control. We describe how model-free Reinforcement Learning can be used to optimize a Recurrent Neural Network controller for closed-loop predictive control. First, we verify our proposed approach for tip-tilt control in simulations and a lab setup. The results show that this algorithm can effectively learn to mitigate vibrations and reduce the residuals for power-law input turbulence as compared to an optimal gain integrator. We also show that the controller can learn to minimize random vibrations without requiring online updating of the control law. Next, we show in simulations that our algorithm can also be applied to the control of a high-order deformable mirror. We demonstrate that our controller can provide two orders of magnitude improvement in contrast at small separations under stationary turbulence. Furthermore, we show more than an order of magnitude improvement in contrast for different wind velocities and directions without requiring online updating of the control law.
△ Less
Submitted 24 August, 2021;
originally announced August 2021.