TY - JOUR AU - Gambo, Ishaya AU - Agbonkhese, Christopher AU - Aina, Segun AU - Otegbayo, Mogboluwaga Tayo AU - Adekunle, Johnson Bayo AU - Odetola, Israel AU - Gambo, Omobola AU - Oluwadare, Tolulope AU - Odetola, Oluwatoni PY - 2024 TI - Reinforcement Learning in Financial Services: Modelling Payment Switching as a Multi-Armed Bandit Problem JF - Journal of Computer Science VL - 20 IS - 11 DO - 10.3844/jcssp.2024.1519.1529 UR - https://thescipub.com/abstract/jcssp.2024.1519.1529 AB - The ever-evolving landscape of digital payments demands continuous innovation and self-improvement. This study addresses this imperative by simulating a model for payment routing, a crucial aspect of the digital payment ecosystem. To achieve this, industry professionals were interviewed to inform the approach, emphasizing data randomization for effective data collection. Using Python, a randomized dataset is created and three Reinforcement Learning (RL) algorithms are implemented and evaluated: Epsilon Greedy, Upper Confidence Bound (UCB), and Thompson Sampling. The paper adopts the Multi-Armed Bandit (MAB) framework to model payment routing as a resource allocation problem, offering a computational approach to real-world resource allocation dilemmas. Through simulation, we eliminate real-time transaction costs, allowing us to focus on algorithmic approaches without implications for customers, businesses, or payment providers. Among the RL algorithms studied, UCB emerges as the most effective in addressing this Multi-Armed Bandit problem, corroborating findings from prior research. This study suggests not only the potential of modeling real-world problems as MAB but also the superior performance of the UCB algorithm in solving RL problems. The paper underscores the need for increased focus on non-consumer-facing aspects of the financial services industry, emphasizing cross-disciplinary research to create infrastructure and software solutions. Researchers can extend this study by exploring MAB algorithms in various domains with options for system choices. The simulation-based approach offers a cost-effective means of testing system performance and hypotheses across a spectrum of industries, fostering innovation and progress.