Hello and thank you for any help in advance.
My company has asked me to explore Ray as a solution for a use case we have. Currently we have a supervised learning model as well as a rule-based model in production and we are hoping to utilize RL and Ray. I have been going through the tutorials and have made some progress but I am having issues.
The problem is as follows:
Essentially it is an ad placement problem. We have some space on our company app that allows for placement of one of several informative “ads” that lead to other products. Only one of these ads can be placed at once when a user opens the ap. The feature space is basically demographics and past interactions any one customer has had with our company. The “reward” would be simple. One iteration and the customer either clicks the presented ad “1 point” or doesn’t “0 points.” At least that is what I believe it should be.
Now, my problem is, I dont have real time access to what was clicked. On any given day in production there would only be information customer interactions (including what ad was clicked) up to the previous day. Even what the customer clicks at that moment would be only available the next day.
On top of that, I have a host of previous interactions with the rule-based algo, a random algo, and the supervised learning algo. I use these to train and refine the current supervised learning algo.
That said, how should I best proceed here? Is there some kind of example / tutorial I should follow? I am unclear as to how I can get this thing to train daily given results are only available the next day. I also cant seem to find how to train with the historical data I have.