RL4AA’26 Poster Abstract
Background and Motivation
In crisis situations, hospitals can face shortages of medical devices required for the proper treatment of patients. More effective coordination of existing medical resources could therefore
improve patient care as well as the resilience of the healthcare system.
The objective of this work is to develop a spatial simulation that models...
Automating tuning has been an area of great interest in the accelerator community in recent years. Bayesian Optimisation (BO) has been favoured over Reinforcement Learning (RL) due to its short training time and reliability. However, RL has become increasingly viable with access to large training datasets from fast and differentiable simulation, Cheetah.
In this work, we develop Cheetah...
Modern particle accelerators operate in highly complex, nonlinear, and time-varying regimes, where optimal performance relies on the coordinated tuning of many coupled parameters under uncertainty and noise. Traditional control and optimization strategies based on physics models, linearization, or manual tuning often struggle to adapt in real time to changing beam conditions, hardware drifts,...
This contribution will be based on the paper "Batch spacing optimization by reinforcement learning" (DOI: https://doi.org/10.1103/g9wr-197z):
Beams designated for the LHC are injected into the SPS in multiple batches. Given the tight spacing of 200 ns between these batches, the injection kickers have to be precisely synchronized with the injected beam to minimize injection oscillations. Due...
This study introduces a novel binary trigger-based state representation for deep reinforcement learning (DRL) in stock trading. Unlike conventional approaches using continuous technical indicators (MACD, RSI, CCI, ADX), we encode market state via binary signals: MVX (moving-average crossover) and BOLLX (Bollinger band breakout). We also propose trigger-date filtering, which trains only on...
Robust accelerator control increasingly relies on data-driven optimisation, yet balancing adaptability with safety remains challenging. Simulation-driven, physics-informed reinforcement learning (RL) relies on soft constraints without formal safety guarantees, and classical response-matrix inversion (RMI) becomes suboptimal under noise
and hard actuator limits. Using the AWAKE electron...
The Large Hadron Collider (LHC) requires a collimation system to ensure safe operation with both proton and heavy-ion beams. As of 2023, a crystal collimation scheme using bent silicon crystals was introduced to improve the collimation efficiency for heavy-ion beams. However, drifts in the crystal angular position led to the loss of cleaning performance during physics fills. These drifts are...
This poster presents the design and implementation of a Gated Recurrent Unit (GRU) on Xilinx Versal AI Engines. We outline the mapping of GRU computations to the AI Engine architecture, discuss dataflow and parallelization strategies, and highlight performance considerations for efficient recurrent neural network inference. The design supports unquantized models by leveraging 32-bit...
The muon electric dipole moment (muEDM) experiment at PSI relies on highly sensitive off-axis muon injection into a compact frozen-spin trap. Injection performance depends strongly on magnetic field and material properties that are difficult to characterize with sufficient accuracy prior to commissioning. For a system of this complexity, purely feed-forward optimization of experimental...
Reliable and well‑characterised laser‑driven proton beams are essential for advancing laser‑ion acceleration from fundamental research to practical applications such as medical physics [1]. However, shot-to-shot variability and the lack of robust, non‑invasive diagnostics continue to limit progress. Recent advances in machine learning [2] offer a promising route to overcoming these challenges...
Laser-plasma accelerators (LPAs) still trail conventional accelerators in terms of their ability to generate high-quality electron beams with low shot-to-shot variation. But with higher repetition rates and longer-term operation, the use of machine learning techniques is becoming increasingly viable as a control tool for improving the stability and reliability of LPAs.
In this context,...
Stripper foil degradation in the Low Energy Ion Ring (LEIR) causes beam distribution drift that progressively degrades performance during multi-turn stacking at flat bottom. World models have emerged as a promising approach for sample-efficient and robust agents, enabling them to improve their behavior by rolling out policies in learned environment models between real interactions, thereby...
Commissioning slow extracted beams from the CERN Super Proton Synchrotron (SPS) to the North Area experimental targets requires trajectory control through multiple transfer lines using corrector magnets—a process that traditionally demands significant expert intervention. Previous work demonstrated using reinforcement learning (RL) for automated trajectory correction based on secondary...
The complexity of the CERN and GSI/FAIR accelerator facilities requires a high degree of automation to maximize beam time and performance for physics experiments. Geoff, the Generic Optimization Framework & Frontend, is an open-source tool developed within the EURO-LABS project by CERN and GSI to streamline access to classical and AI-based optimization methods. It provides standardized...
Standard Reinforcement Learning (RL) for trajectory tracking typically relies on myopic state representations, providing agents only with the current target. This forces a reactive control paradigm, resulting in lag and overshoot during dynamic transitions. To address this, we propose augmenting the standard RL state space, which traditionally contains only the current reference, with future...
Particle accelerators and their design studies generate large amounts of historical data from archived operation logs and high-fidelity simulations, yet most learning-based control strategies still rely on online optimisation, where new data must be collected through direct machine interaction. To make better use of such pre-generated data and avoid additional online exploration, we present a...
Experimental studies of beauty hadron decays face significant challenges due to a wide range of backgrounds arising from the numerous possible decay channels with similar final states. For a particular signal decay, the process for ascertaining the most relevant background processes necessitates a detailed analysis of final state particles, potential misidentifications, and kinematic overlaps,...
Geometry optimization of atomic structures is a common and crucial task in computational chemistry and materials design. Following the learning to optimize paradigm, we propose a new multi-agent reinforcement learning method called Multi-Agent Crystal Structure optimization (MACS) to address periodic crystal structure optimization. MACS treats geometry optimization as a...
The beam intensity in the injector chain at CERN has been nearly doubled as part of the upgrades for the High-Luminosity LHC (HL-LHC). This presents multiple operational challenges. A critical bottleneck is the uncaptured beam created during the transfer from the Proton Synchrotron (PS) to the Super Proton Synchrotron (SPS). Tomographic reconstruction of the longitudinal distribution during...
Multi-Agent Reinforcement Learning (MARL) is an important subfield of Reinforcement Learning, in which multiple agents learn in a shared environment. The simultaneous learning of several players naturally arises in domains like robotics, network communication and traffic control, where agents affect and influence one another. Thus, MARL can simulate real-world problems in a reliable way, and...
For the Beijing Electron-Positron Collider II (BEPCII), operators need to tune the transverse offsets—including displacement and angular deviation (x, x’, y, y’)—of the two beams at the interaction point (IP) to maintain high luminosity as the beam current decays during normal operation. Given that the optimal offset exhibits a non-linear variation with beam current within a single run and...
Development shifts on accelerators are usually time-constrained and infrequent. Meanwhile, control room PCs are not designed for scrappy R\&D, and maintaining multiple workflows with python scripts is prone to error. GUI apps have been successfully deployed and used in the past to perform optimisation at accelerator facilities. However, bookkeeping can become difficult in complex tasks....
Most accelerator control systems assume that the effect of an action can be evaluated locally and immediately. While greedy approaches work in near-linear regimes and Bayesian Optimisation (BO) is now standard for black-box tuning, both are essentially static optimisers and struggle in dynamic tasks with delayed consequences, where even adaptive BO remains time-myopic and lacks explicit...
Recent developments at the INFN laboratories in Legnaro have demonstrated the effectiveness of Bayesian optimization in automating the tuning process of particle accelerators, yielding substantial improvements in beam quality, significantly reducing setup times, and shortening recovery times following interruptions. Despite these advances, the high-dimensional parameter space defined by...
By scaling accelerator operation to THz frequencies, dielectric-lined waveguides (DLWs) can achieve accelerating gradients far higher than conventional RF structures, while supporting modes that couple longitudinal acceleration with transverse focusing. We propose a reinforcement-learning–based dynamic tuner that, using beam distribution information, adjusts the THz phase and amplitude of...
Designing advanced particle-physics instruments requires navigating a high-dimensional space of discrete and continuous choices while satisfying strict constraints on material, cost, and geometry. In practice, these constraints evolve throughout an experiment’s lifetime, making it insufficient to optimize a single “best” detector configuration. We present a resource-conditioned reinforcement...
We present advancements in the data-driven Model Predictive Control (MPC) framework for optimizing multi-turn injection (MTI) into the SIS18 synchrotron. Building on our prior work on safe, sample-efficient optimization, we systematically investigate the impact of current noise and transverse emittance fluctuations. By incorporating realistic error models derived from dedicated measurements of...
Using domain knowledge to improve deep RL policies is a current challenge. LEGIBLE mines rules from an RL policy, constituting a partially symbolic representation. These rules describe which decisions the RL policy makes and which it avoids making. It then generalizes the mined rules using domain knowledge. Finally, it evaluates generalized rules to determine which generalizations improve...
In space propulsion, a small set of controllable valves regulates propellant mass flows and thereby achieves desired operating targets. Reinforcement learning (RL) is promising here because it can learn feedback policies directly from simulator interaction. Prior work has demonstrated the practical viability of deep RL for related control tasks. However, even in low-dimensional actuator...
In BNL’s Booster, the beam bunches can be split into two or three smaller bunches to reduce their space-charge forces. They are then merged back after acceleration in the Alternating Gradient Synchrotron (AGS). This acceleration with decreased space-charge forces can reduce the final emittance, increasing the luminosity in RHIC and improving proton polarization. Parts of this procedure have...