Gaussian Process with Vine Copula-Based Context Modeling for Contextual Multi-Armed Bandits

We propose a novel contextual multi-armed bandit (CMAB) framework that integrates copula-based context generation with Gaussian Process (GP) regression for reward modeling, addressing complex dependency structures and uncertainty in sequential decision-making. Context vectors are generated using Gaussian and vine copulas to capture nonlinear dependencies, while arm-specific reward functions are modeled via GP regression with Beta-distributed targets. We evaluate three widely used bandit policies¿Thompson Sampling (TS), (Formula presented.) -Greedy, and Upper Confidence Bound (UCB)¿on simulated environments informed by real-world datasets, including Boston Housing and Wine Quality. The Boston Housing dataset exemplifies heterogeneous decision boundaries relevant to housing-related marketing, while the Wine Quality dataset introduces sensory feature-based arm differentiation. Our empirical results indicate that the (Formula presented.) -Greedy policy consistently achieves the highest cumulative reward and lowest regret across multiple runs, outperforming both GP-based TS and UCB in high-dimensional, copula-structured contexts. These findings suggest that combining copula theory with GP modeling provides a robust and flexible foundation for data-driven sequential experimentation in domains characterized by complex contextual dependencies. © 2025 by the author.