Attribution-based Explanations for Markov Decision Processes

Abstract

Attribution techniques explain the outcome of an AI model by assigning a numerical score to its inputs. So far, these techniques have mainly focused on attributing importance to static input features at a single point in time, and thus fail to generalize to sequential decision-making settings. This paper fills this gap by introducing techniques to generate attribution-based explanations for Markov Decision Processes (MDPs). We give a formal characterization of what attributions should represent in MDPs, focusing on explanations that assign importance scores to both individual states and execution paths. We show how importance scores can be computed by leveraging techniques for strategy synthesis, enabling the efficient computation of these scores despite the non-determinism inherent in an MDP. We evaluate our approach on five case-studies, demonstrating its utility in providing interpretable insights into the logic of sequential decision-making agents.

Publication
Proc. 35th International Joint Conference on Artificial Intelligence (IJCAI 2026). To appear.
Paul Kobialka
Paul Kobialka
PhD Student
Andrea Pferscher
Andrea Pferscher
Postdoctoral research fellow
Francesco Leofante
Francesco Leofante
Assistant Professor