Does RLlib algorithm support both discrete and continuous action spaces simultaneously?

Hey @Jay , what do you mean by simultaneously? RLlib supports both spaces, i.e. gym.spaces.Box and gym.spaces.Discrete. It also supports more complex spaces such as multi-discrete space.

Hi @kourosh

Thank you for the reply! I am currently working the MLagent Dodgeball environment. The agents have both continuous and discrete action space. I tried to implement it as such.

However, I am still facing difficulties running the script.

Thanks for the help!

Yes, here is an example:

Hi @kourosh

Can I ask for a MARL example and is it possible to tell me if my implementation is faulty in any way? Really appreciate the help!

policies = {

“DodgeballAgent”: PolicySpec(

observation_space=TupleSpace(

[

Box(float(“-inf”), float(“inf”), (3,8)),

Box(float(“-inf”), float(“inf”), (738,)),

Box(float(“-inf”), float(“inf”), (252,)),

Box(float(“-inf”), float(“inf”), (36,)),

Box(float(“-inf”), float(“inf”), (378,)),

Box(float(“-inf”), float(“inf”), (20,))

]

),

action_space=TupleSpace([

Box(-1.0, 1.0, (3,), dtype = np.float32),

MultiDiscrete([2,2])

]

)),

}

```
config = (
PPOConfig()
.environment(
"unity3d",
env_config={
"file_name": None,
"episode_horizon": None,
},
disable_env_checking = True
)
.framework("torch")
# For running in editor, force to use just one Worker (we only have
# one Unity running)!
.rollouts(
num_rollout_workers=0,
rollout_fragment_length=200,
)
.training(
lr=0.0003,
lambda_=0.95,
gamma=0.99,
sgd_minibatch_size=256,
train_batch_size=4000,
num_sgd_iter=20,
clip_param=0.2,
model={"fcnet_hiddens": [512, 512]},
)
.multi_agent(policies=policies,
policy_mapping_fn=lambda agent_id, *args, **kwargs: "DodgeballAgent",)
# Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
.resources(num_gpus=1)
)
```

Is it possible for you to share a clean and concise repro script, so that I can run it on my end to test?

This is the code that I tried to run. Do note that the error only occurs after I try to run the Unity Dodgeball environment to train the agents.

This is a screenshot of the error for reference.