Example running RL on tuple space

Saurabh_Arora · January 31, 2022, 4:24pm

Can any one please point me to an example using action space that is a tuple of discrete and continuous space?
@sven1977 @rusu24edward

rusu24edward · January 31, 2022, 4:51pm

Here’s a Team-Battle game out of Abmarl’s GridWorld: Abmarl/team_battle_example.py at main · LLNL/Abmarl · GitHub. This is actually a Dict, not a Tuple; I believe they work very similarly. Actually, I prefer the Dict because they key is descriptive. I trained a similar version of this game with RLlib and generated good results.

Saurabh_Arora · January 31, 2022, 6:27pm

Can you please share link to that rllib code so that I can understand it better?

rusu24edward · January 31, 2022, 9:09pm

I don’t have the exact code as I used for training that use case available. Here’s an example script using a different multi-agent environment. Although this one does not have Dict action space, the approach is the same since that detail is abstracted in the RLlib framework.

Saurabh_Arora · February 3, 2022, 8:43pm

@rusu24edward and @sven1977
I tried ppo and a3c with Dict action space.

self.action_space = spaces.Dict(
{
‘senderAcctID’: spaces.Discrete(8),
‘channelID’: spaces.Discrete(3),
‘senderChannelAmount’: spaces.Dict(
{
‘EXT_ALL’: spaces.Box(low=np.array([0.02]),high=np.array([0.5]),shape=(1,),dtype=np.float32),
‘INV_WIRE’: spaces.Box(low=np.array([0.02]),high=np.array([0.2]),shape=(1,),dtype=np.float32),
‘INV_OTHERS’: spaces.Box(low=np.array([0.02]),high=np.array([0.5]),shape=(1,),dtype=np.float32),
‘CHK_WIRE’: spaces.Box(low=np.array([0.02]),high=np.array([0.2]),shape=(1,),dtype=np.float32),
‘CHK_OTHERS’: spaces.Box(low=np.array([0.02]),high=np.array([0.5]),shape=(1,),dtype=np.float32),
}
),
‘receiverAcctID’: spaces.Discrete(8)
}
)

config = ppo.DEFAULT_CONFIG.copy()
config[‘num_workers’] = 4
config[‘horizon’] = timestep_limit_per_episode

The ‘episode reward mean’ is not increasing with more iterations for either algorithm. Same algo + env settings converged with discrete action space. Any suggestions on how to make it work for above dict space?

gjoliver · February 4, 2022, 7:32pm

Hi Saurabh, thank for using RLlib.
There are many many reasons that may stop an RL stack from learning. We need more information about your setup and the environment to debug here.

also for an example of using dict/tuple obs and action spaces, have you checked RLlib’s example folder? E.g.: https://github.com/ray-project/ray/blob/master/rllib/examples/nested_action_spaces.py

Saurabh_Arora · February 4, 2022, 9:48pm

Hi. Thanks for responding.

@gjoliver could you please let me know what specific information is needed for debugging so that i provide it?

I have seen the example you shared, but it is not clear what is the basis of deciding following values . Could you please help me understand it better?

,
“entropy_coeff”: 0.00005, # We don’t want high entropy in this Env.
“lr”: 0.0005,
“num_envs_per_worker”: 20,
“num_sgd_iter”: 4,
“num_workers”: 0,
“vf_loss_coeff”: 0.01,

gjoliver · February 4, 2022, 10:20pm

sure. it would be the best if you can share a reproduce-able script so we can see the environment and test things on our end.

about the configuration parameters, the best way is to do a hyper-param search.
they are usually problem dependent.

“entropy_coeff” - controls the amount of entropy_loss that goes into total_loss. the higher this parameter is, the more stochastic the policy becomes.
“lr”: standard learning rate.
“nun_envs_per_worker”: how many envs to run in a single worker. if you think your workers are under utilized, you can try tuning this parameter.
“num_sgd_iter”: # of stochastic gradient descend steps we do for each batch of samples.
“num_workers”: this is the number of environments, rollout workers you want to use for the traine.
“vf_loss_coeff”: similar to entropy_coeff, this controls how much value function loss goes into the final total loss.

Topic		Replies	Views
Initial action for Dict action space RLlib	5	1329	July 23, 2021
Does RLlib algorithm support both discrete and continuous action spaces simultaneously? RLlib	7	1637	February 22, 2023
Using random action policy with dict action space RLlib	0	278	April 12, 2021
Handling spaces.Dict in Multi-Agent Environment without .shape Attribute Error RLlib	0	214	May 5, 2024
Using Dict observation space with custom RLModule RLlib	7	326	January 6, 2025

Example running RL on tuple space

Related topics