RL-based RecSys for product Recommendation using rllib

Hi,
I’m new here and also in the RL field. I am doing a research work on RL-based recommender system with amazon review data.

i am trying to develop a custom environment and then use DQN but i am confused on how to determine my action space, observation space and reward. i would appreciate any help i can get.

I am using the Grocery and Gourmet category data with the following features.

Review data:
Overall, Veridied, reviedTime, reviewerID, asin, reviewerName, reviewText, vote, image, style

Meta Data:

@kourosh

Thank you

Hey @Prisca ,

Starting from the top level, you need to think about what you want your policy to achieve and what signal can you use to guide your learning (reward signal)

I don’t know the details of this dataset, but I am thinking that you need to train a policy to suggest items based on some user features, and your learning signal could be positive / negative sentiments of the captured reviews.

For observation you can use what ever you have (list of embedding of products concatenated with users) and action space can index of the corresponding product, or some vector embedding that can be used to retrieve the product (continuous action space).

I hope this gives you enough information to start your project.

Thank you so much @kourosh for your input.