RL based recommender system with no simulator


How could I train a model (lets say SlateQ) if I don’t have a simulator environment to do online training?

I’d plug the model on my website and I could register observations/actions into a replay buffer, but the reward I need to calculate it in a daily batch. So, I would be training an online model offline. Is that possible?

Thank you in advance!