Financial market making using RLLib


This is a theoretical question and apologies if this is a wrong forum.

I am trying to model a trading algorithm with the following dynamics:

S = State of market
A = agents actions
R = reward (profit/loss)

In this lets call game, the agent receives a reward if his actions (trades) are profitable.

  1. The transition probabilities are P(S(T) | S(T-1)) i.e. the agent’s actions have no effect on the next state of the market (too many market participants).

  2. The profit or loss is not instant. It is accrued over a period as such there is a credit assignment problem.

My questions are:

  1. Is this an MDP, POMDP or a semi MDP?
  2. Does any of RLLib algorithms work in such a scenario?