Speakers
Description
In space propulsion, a small set of controllable valves regulates propellant mass flows and thereby achieves desired operating targets. Reinforcement learning (RL) is promising here because it can learn feedback policies directly from simulator interaction. Prior work has demonstrated the practical viability of deep RL for related control tasks. However, even in low-dimensional actuator settings, as in our 2x2 configuration, training can be sample-inefficient and unstable due to nonlinear, coupled dynamics and constraint-critical transients. The resulting instability increases the risk of converging to suboptimal solutions or violating safety limits. To address these challenges, this study investigates a curriculum learning approach to improve training efficiency and reliability for RL-based control of the LUMEN (Liquid Upper stage deMonstrator ENgine) simulation environment. Using Proximal Policy Optimization (PPO), training is structured as a progression from simplified operating regimes toward increasingly demanding tasks by expanding the operating envelope. Evaluation focuses on training efficiency, transient-response quality, and tracking of constraint violations, complemented by time-series diagnostics and reward-component analyses to interpret learning progress.
| Student | Yes |
|---|