I want to develop a 5G base station (BS) simulator to optimize handovers using Reinforcement Learning (RL) , but face a key challenge: available datasets only provide UE-side measurements (RSRP, RSRQ) without revealing BS-side decision logic. My goal is to model BS behavior under multiple handover requests , incorporating load balancing and dynamic parameters (hysteresis, offsets) to train an RL agent for autonomous optimization. I need guidance on: structuring the BS decision model and generating realistic training data in the absence of real BS-side datasets.