Probabilistic and deterministic algorithms

Peter_Pirog · May 20, 2022, 12:53pm

I wonder which of algorithms https://docs.ray.io/en/latest/rllib/rllib-algorithms.html can be use to solve probabilistic environments like Rock paper scissors game of contextual bandit? Is there special configuration parameter to use probabilitic policy.

What is the best way to define observation if the observation is always the same.

arturn · May 27, 2022, 4:27pm

Hey @Peter_Pirog,

Have a look at the [contextual bandits section](https://docs.ray.io/en/latest/rllib/rllib-algorithms.html#contextual-bandits), Linear Upper confidence Bound and Thompson Sampling are both algorithms to solve such environments.

What is the best way to define observation if the observation is always the same.

What do you mean by that? Can you explain, please?

Peter_Pirog · May 30, 2022, 9:11pm

@arturn , Maybe I dont understand correctly but if I have 3 bandits and I can use any of them for N times, in each of N iterations my observations are the same - available 3 bandits, for axample
iteration 1 - can use any of bandit [1,1,1]
iteration 2 - can use any of bandit [1,1,1]
iteration 3 - can use any of bandit [1,1,1] etc.

The difference is in the reward but obserwation is always the same where postion of 1 in the list shows which bandit is available [0,1,0] means I can use only bandit 1, bandit 0 and bandit 2 are unavailable.

kourosh · June 1, 2022, 12:46am

@Peter_Pirog You should be able to use existing contextual bandit algorithms we currently have in RLlib for your problem. Contextual bandit is essentially a super-set of the problem with fixed observation that you mentioned. You should just create an environment that always returns a fixed observation. I hope this helps.

Peter_Pirog · June 1, 2022, 5:04am

@kouros, Thank You for the answer:

You should just create an environment that always returns a fixed observation.

Topic		Replies	Views
Closest approach to continuous Contextual Bandits RLlib	0	271	August 22, 2022
My RLlib implementation seems to compute random actions RLlib	4	919	February 15, 2022
Not able to locate rllib train function code RLlib	6	311	March 22, 2023
RLLIB Evaluation on a batch of observations Configure Algorithm, Training, Evaluation, Scaling	1	254	December 11, 2023
Multi-agent rock paper and scissor training gets weird result RLlib	1	327	September 30, 2021

Probabilistic and deterministic algorithms

Related topics