Symbolic State Seeding Improves Coverage of Reinforcement Learning

Abstract

Due to a limited learning budget, a reinforcement learning agent can only explore the most probable scenarios out of a potentially rich and complex environment dynamics. This may result in a limited understanding of the context and low robustness of the learned policy. A possible approach to address this problem is to explore the interactions between an autonomous agent and environment in rare but important situations. We propose SymSeed, a method for initializing learning episodes for the class of reinforcement learning problems for which a simulation environment (model) is available. This increases the chance of exposing the agent to interesting states during learning. Inspired by techniques for increasing coverage in testing of software, we analyze the simulator implementation using symbolic execution. Then we generate initial states that ensure that the agent explores the simulator dynamics well during learning. We evaluate SymSeed by feeding the generated states into well-known reinforcement learning algorithms, both tabular and approximating methods, including vanilla Q-Learning, DQN, PPO, A3C, SAC, TD3, and CAT-RL. In all test cases, the combination of SymSeed with uniform sampling from the entire state space enables all algorithms to achieve faster convergence and higher success rates than the baseline. The effect is particularly strong in presence of sparse rewards or local optima.

Publication
Proc. 20th Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS 2025). © IEEE/ACM 2025. To appear.