[rllib] wrong action dimensions when using dictionary action space

homriidan · July 12, 2021, 8:22pm

I opened an issue few days ago which has not been responded, maybe someone could help here.
The issue is that attention_net with PPO policy and dictionary action space does not seems to work.

github.com/ray-project/ray

[rllib][models][torch][attention_net] wrong action dimensions when using dictionary action space

opened 08:19PM - 09 Jul 21 UTC

homriidan

bug triage

Hi all, I'm trying to use the attention net model with dictionary action space …without success. `action space: Dict(AAA:Box(0.0, 3.0, (2,), float32), BBB:Box(0.0, 3.0, (2,), float32), CCC:Box(0.0, 3.0, (2,), float32))` I ran into a number of issues at class "AttentionWrapper(TorchModelV2, nn.Module)": 1. self.action_dim It does not handle dictionary action dim and fall into default case. https://github.com/ray-project/ray/blob/master/rllib/models/torch/attention_net.py#L264 ``` if isinstance(action_space, Discrete): self.action_dim = action_space.n elif isinstance(action_space, MultiDiscrete): self.action_dim = np.product(action_space.nvec) elif action_space.shape is not None: self.action_dim = int(np.product(action_space.shape)) else: self.action_dim = int(len(action_space)) <-- "TypeError: object of type 'Dict' has no len()" ``` WA: I flattened the action space using: `from ray.rllib.utils.spaces.space_utils import flatten_space` output: ``` ================================================================ action space: Dict(AAA:Box(0.0, 3.0, (2,), float32), BBB:Box(0.0, 3.0, (2,), float32), CCC:Box(0.0, 3.0, (2,), float32)) flatten_action_space: [Box(0.0, 3.0, (2,), float32), Box(0.0, 3.0, (2,), float32), Box(0.0, 3.0, (2,), float32)] action_dim: 6 ================================================================ ``` 2. It takes the wrong dimensions per action. Configuration: Policy: PPOTorchPolicy Model: Attention Net ``` "model": { "use_attention": True, "attention_num_transformer_units": 1, "attention_dim": 64, "attention_num_heads": 1, "attention_head_dim": 30, "attention_memory_inference": 50, "attention_memory_training": 50, "attention_position_wise_mlp_dim": 32, "attention_init_gru_gate_bias": 2.0, "attention_use_n_prev_actions": 15, "attention_use_n_prev_rewards": 15, }, ``` Error: ``` File "/home/idanh/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/models/torch/attention_net.py", line 376, in forward torch.reshape( RuntimeError: shape '[-1, 90]' is invalid for input of size 192 ``` i.e. *** Instead of taking `attention_use_n_prev_actions*action_dim = 15*6 = 90` it uses `32*6 = 192` I looked for this 32 size origin, and it comes from default batch size of 32 at: https://github.com/ray-project/ray/blob/master/rllib/policy/policy.py#L643 *** Why i'm getting this default batch size and without any correlation to attention_use_n_prev_actions? *Ray version and other system information (Python version, TensorFlow version, OS):* Ray 2.0.0.dev Ubuntu-18.04 Python 3.8.6 REPRODUCED: I ran the nested action space example with attention nets activated and with torch framework (see adapted code below) ray/rllib/examples/nested_action_spaces.py ``` import argparse from gym.spaces import Dict, Tuple, Box, Discrete import os import ray import ray.tune as tune from ray.tune.registry import register_env from ray.rllib.examples.env.nested_space_repeat_after_me_env import \ NestedSpaceRepeatAfterMeEnv from ray.rllib.utils.test_utils import check_learning_achieved parser = argparse.ArgumentParser() parser.add_argument( "--run", type=str, default="PPO", help="The RLlib-registered algorithm to use.") parser.add_argument( "--framework", choices=["tf", "tf2", "tfe", "torch"], default="torch", help="The DL framework specifier.") parser.add_argument("--num-cpus", type=int, default=0) parser.add_argument( "--as-test", action="store_true", help="Whether this script should be run as a test: --stop-reward must " "be achieved within --stop-timesteps AND --stop-iters.") parser.add_argument( "--stop-iters", type=int, default=100, help="Number of iterations to train.") parser.add_argument( "--stop-timesteps", type=int, default=100000, help="Number of timesteps to train.") parser.add_argument( "--stop-reward", type=float, default=0.0, help="Reward at which we stop training.") if __name__ == "__main__": args = parser.parse_args() ray.init(num_cpus=args.num_cpus or None) register_env("NestedSpaceRepeatAfterMeEnv", lambda c: NestedSpaceRepeatAfterMeEnv(c)) config = { "env": "NestedSpaceRepeatAfterMeEnv", "env_config": { "space": Dict({ "a": Tuple( [Dict({ "d": Box(-10.0, 10.0, ()), "e": Discrete(2) })]), "b": Box(-10.0, 10.0, (2, )), "c": Discrete(4) }), }, "entropy_coeff": 0.00005, # We don't want high entropy in this Env. "gamma": 0.0, # No history in Env (bandit problem). "lr": 0.0005, "num_envs_per_worker": 20, # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0. "num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")), "num_sgd_iter": 4, "num_workers": 0, "vf_loss_coeff": 0.01, "framework": args.framework, "model": { "use_attention": True, "attention_num_transformer_units": 1, "attention_dim": 64, "attention_num_heads": 1, "attention_head_dim": 30, "attention_memory_inference": 50, "attention_memory_training": 50, "attention_position_wise_mlp_dim": 32, "attention_init_gru_gate_bias": 2.0, "attention_use_n_prev_actions": 15, "attention_use_n_prev_rewards": 15, }, } stop = { "training_iteration": args.stop_iters, "episode_reward_mean": args.stop_reward, "timesteps_total": args.stop_timesteps, } results = tune.run(args.run, config=config, stop=stop, verbose=1) if args.as_test: check_learning_achieved(results, args.stop_reward) ray.shutdown() ``` Thanks, Idan

thanks,
idan

mannyv · July 13, 2021, 3:51am

Hi @homriidan,

The issue is coming from here:

github.com

ray-project/ray/blob/e7350ff8282660bdd72250c1a553aebb51b20cf4/rllib/policy/policy.py#L740-L755

    
      
          if isinstance(view_req.space, (gym.spaces.Dict, gym.spaces.Tuple)):
              _, shape = ModelCatalog.get_action_shape(
                  view_req.space, framework=self.config["framework"])
              ret[view_col] = \
                  np.zeros((batch_size, ) + shape[1:], np.float32)
          else:
              # Range of indices on time-axis, e.g. "-50:-1".
              if view_req.shift_from is not None:
                  ret[view_col] = np.zeros_like([[
                      view_req.space.sample()
                      for _ in range(view_req.shift_to -
                                     view_req.shift_from + 1)
                  ] for _ in range(batch_size)])
              # Set of (probably non-consecutive) indices.
              elif isinstance(view_req.shift, (list, tuple)):
                  ret[view_col] = np.zeros_like([[

For some reason with a Dictionary space it is not expanding for the view requirement. So you can see here that prev_actions will only contain the size of 1 action (5) and not include the view requirement (15 * 5). If you look at the first conditional in the else clause a few lines below you can see how it is including the view_requirement info to expand the size.

In the call to forward, if you look in the input dictionary to check the shapes you can confirm this:

input_dict["prev_rewards"].shape
torch.Size([32, 15])
input_dict["prev_actions"].shape
torch.Size([32, 5]) #<- this should be [32,75]

I am not sure if this was done intentionally and dictionary spaces are not valid spaces for view requirements that shift the size or if it was a bug.

@sven1977 will know more.

sven1977 · July 14, 2021, 8:52pm

Responded on the github issue. Prepping a fix-it PR

sven1977 · July 15, 2021, 10:25am

@mannyv @homriidan ^^

Topic		Replies	Views
Initial action for Dict action space RLlib	5	1322	July 23, 2021
[rllib] Dict Action Space and Custom Model RLlib	5	2456	March 30, 2021
Undestanding the expected output shapes of a Recurrent model with Dict Action Space Configure Algorithm, Training, Evaluation, Scaling	2	293	January 15, 2024
Action masking & Dict observation space & 'avail_actions'? Configure Algorithm, Training, Evaluation, Scaling	1	1046	August 4, 2023
How to use LSTM or Attention Network action masking with nested dict action space? RLlib	0	263	August 24, 2023

[rllib] wrong action dimensions when using dictionary action space

Related topics