Initial action for Dict action space

Lars_Simon_Zehnder · July 22, 2021, 9:23pm

Hi folks,

I have the following action space in my gym environment :

action_space = {
            "trade": Discrete(3),            
            "stop": Box(low=0.0, high=np.inf, shape=(1,), dtype=np.float32),
        }
self.action_space = Dict(action_space)

Executing my code gives me an error of the following form:

...
  File "/home/simon/git-projects/learning/.venv/lib/python3.9/site-packages/ray/rllib/utils/debug.py", line 38, in _summarize
    obj.shape, obj.dtype, _summarize(obj[0])))
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

I debugged my code and found that my episode data (more precisely: actions therein) contains at the first position obj[0] a different sized array, namely array(0, dtype=object). Here is an example:

array([array(0, dtype=object), array([nan,  0.]), array([nan,  0.]),
       array([nan,  0.]), array([ 1.08566999, -1.        ]),
       array([nan,  0.]), array([nan,  0.]), array([nan,  0.]),
       array([nan,  0.]), array([nan,  0.]), array([nan,  0.]),
       array([nan,  0.]), array([nan,  0.]), array([nan,  0.]),
       array([nan,  0.]), array([nan,  0.]), array([nan,  0.]),
       array([nan,  0.]), array([nan,  0.]), array([nan,  0.]),

It appears that only the first action looks different and makes problems. I guess this is the intial action and it is somehow defined in the rllib source code (I think it is here). However, when I use the same code as linked in the last sentence, I get:

[0.05556222 1.        ]

which appears fine to me.

So, I wonder, where does this first action come from and what do I have to change to make my code run again? Maybe @sven1977 or @mannyv know more

Any help welcome and thanks for your time (and the fish)

mannyv · July 23, 2021, 1:06pm

Hi @Lars_Simon_Zehnder,

rollout.py is used to generate rollouts on a policy it does not do any training so I don’t think this is where you problem lies, unless I misunderstand your question.

The first place I would look, and maybe you already have is in the reset function of your environment. This is where the first observation will come from. Is it somehow returning something different for the observation then step is?

If I were you I would also be concerned with those nan’s.

Are you handling the combination of Discrete and Continuous actions in a special way? I do not remember seeing rllib handle mixed action spaces but in all honesty it could be there and I have not encountered it.

Manny

Lars_Simon_Zehnder · July 23, 2021, 1:49pm

Hi @mannyv ,

thanks for your help again. I also got a little more precise in my question above. The problem in my epsiode data is the actions array with actions from a single episode. And the first of these is different than the others. Exactly this first one makes problems in the postprocessing of an episode. My question is now - what produces this first action and what do I have to change to get as first action a similar array as the others?

The first place I would look, and maybe you already have is in the reset function of your environment. This is where the first observation will come from. Is it somehow returning something different for the observation then step is?

In the reset() method of my environment I actually do not generate actions - is that something one should? The reset() function in my environment returns simply the observation, whereas my step() function returns in addition also reward, done, and info. I think that should work.

If I were you I would also be concerned with those nan’s.

The nans come on purpose. Earlier I used None values as an indicator that an agent does nothing (only in the stop variable of the action as this is a float, I could also use 0.0) , but this brought up errors. Would you rather suggest using 0.0?

Are you handling the combination of Discrete and Continuous actions in a special way? I do not remember seeing rllib handle mixed action spaces but in all honesty it could be there and I have not encountered it.

Good question @mannyv ! I actually came up with this because I learned to be type-conform and as it makes the code more readable with having names and specific types. trade is an indicator and therefore I used discrete values. I could of course also use a float for the second value (trade) and simply a Box-space with shape=(2,). Maybe this is the reason for this phenomenon.

Simon

Lars_Simon_Zehnder · July 23, 2021, 4:22pm

Hi @mannyv,

I tested now two further action space versions - and your intuition was pretty right The problem is rooted in the action space type.

I first used a Dict action space:

action_space = {
      'trade': Box(low=-1, high=1, shape=(1,), dtype=np.int8),
      'stop': Box(low=-np.inf, high=np.inf, shape=(1,), dtype=np.float64)
}

action_space = Dict(action_space)

This gave the same error as in my initial question. Then I tried a simple Box action space:

action_space = Box(low=-np.inf, high=np.inf, shape=(2,), dtype=np.float64)

and this worked out. Training runs through now. Probably Dict action spaces are not yet implemented? It would be a nice feature as it allows to refer to certain action elements by name and makes code more readable.

I will try now, if a Tuple action space will work.

mannyv · July 23, 2021, 4:29pm

@Lars_Simon_Zehnder,

This issue might have a similar root cause. You might want to track it and see if it fixes your issue when it is resolved…

github.com/ray-project/ray

[rllib][models][torch][attention_net] wrong action dimensions when using dictionary action space

opened 08:19PM - 09 Jul 21 UTC

homriidan

P2 bug rllib

Hi all, I'm trying to use the attention net model with dictionary action space …without success. `action space: Dict(AAA:Box(0.0, 3.0, (2,), float32), BBB:Box(0.0, 3.0, (2,), float32), CCC:Box(0.0, 3.0, (2,), float32))` I ran into a number of issues at class "AttentionWrapper(TorchModelV2, nn.Module)": 1. self.action_dim It does not handle dictionary action dim and fall into default case. https://github.com/ray-project/ray/blob/master/rllib/models/torch/attention_net.py#L264 ``` if isinstance(action_space, Discrete): self.action_dim = action_space.n elif isinstance(action_space, MultiDiscrete): self.action_dim = np.product(action_space.nvec) elif action_space.shape is not None: self.action_dim = int(np.product(action_space.shape)) else: self.action_dim = int(len(action_space)) <-- "TypeError: object of type 'Dict' has no len()" ``` WA: I flattened the action space using: `from ray.rllib.utils.spaces.space_utils import flatten_space` output: ``` ================================================================ action space: Dict(AAA:Box(0.0, 3.0, (2,), float32), BBB:Box(0.0, 3.0, (2,), float32), CCC:Box(0.0, 3.0, (2,), float32)) flatten_action_space: [Box(0.0, 3.0, (2,), float32), Box(0.0, 3.0, (2,), float32), Box(0.0, 3.0, (2,), float32)] action_dim: 6 ================================================================ ``` 2. It takes the wrong dimensions per action. Configuration: Policy: PPOTorchPolicy Model: Attention Net ``` "model": { "use_attention": True, "attention_num_transformer_units": 1, "attention_dim": 64, "attention_num_heads": 1, "attention_head_dim": 30, "attention_memory_inference": 50, "attention_memory_training": 50, "attention_position_wise_mlp_dim": 32, "attention_init_gru_gate_bias": 2.0, "attention_use_n_prev_actions": 15, "attention_use_n_prev_rewards": 15, }, ``` Error: ``` File "/home/idanh/anaconda3/envs/my_env/lib/python3.8/site-packages/ray/rllib/models/torch/attention_net.py", line 376, in forward torch.reshape( RuntimeError: shape '[-1, 90]' is invalid for input of size 192 ``` i.e. *** Instead of taking `attention_use_n_prev_actions*action_dim = 15*6 = 90` it uses `32*6 = 192` I looked for this 32 size origin, and it comes from default batch size of 32 at: https://github.com/ray-project/ray/blob/master/rllib/policy/policy.py#L643 *** Why i'm getting this default batch size and without any correlation to attention_use_n_prev_actions? *Ray version and other system information (Python version, TensorFlow version, OS):* Ray 2.0.0.dev Ubuntu-18.04 Python 3.8.6 REPRODUCED: I ran the nested action space example with attention nets activated and with torch framework (see adapted code below) ray/rllib/examples/nested_action_spaces.py ``` import argparse from gym.spaces import Dict, Tuple, Box, Discrete import os import ray import ray.tune as tune from ray.tune.registry import register_env from ray.rllib.examples.env.nested_space_repeat_after_me_env import \ NestedSpaceRepeatAfterMeEnv from ray.rllib.utils.test_utils import check_learning_achieved parser = argparse.ArgumentParser() parser.add_argument( "--run", type=str, default="PPO", help="The RLlib-registered algorithm to use.") parser.add_argument( "--framework", choices=["tf", "tf2", "tfe", "torch"], default="torch", help="The DL framework specifier.") parser.add_argument("--num-cpus", type=int, default=0) parser.add_argument( "--as-test", action="store_true", help="Whether this script should be run as a test: --stop-reward must " "be achieved within --stop-timesteps AND --stop-iters.") parser.add_argument( "--stop-iters", type=int, default=100, help="Number of iterations to train.") parser.add_argument( "--stop-timesteps", type=int, default=100000, help="Number of timesteps to train.") parser.add_argument( "--stop-reward", type=float, default=0.0, help="Reward at which we stop training.") if __name__ == "__main__": args = parser.parse_args() ray.init(num_cpus=args.num_cpus or None) register_env("NestedSpaceRepeatAfterMeEnv", lambda c: NestedSpaceRepeatAfterMeEnv(c)) config = { "env": "NestedSpaceRepeatAfterMeEnv", "env_config": { "space": Dict({ "a": Tuple( [Dict({ "d": Box(-10.0, 10.0, ()), "e": Discrete(2) })]), "b": Box(-10.0, 10.0, (2, )), "c": Discrete(4) }), }, "entropy_coeff": 0.00005, # We don't want high entropy in this Env. "gamma": 0.0, # No history in Env (bandit problem). "lr": 0.0005, "num_envs_per_worker": 20, # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0. "num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")), "num_sgd_iter": 4, "num_workers": 0, "vf_loss_coeff": 0.01, "framework": args.framework, "model": { "use_attention": True, "attention_num_transformer_units": 1, "attention_dim": 64, "attention_num_heads": 1, "attention_head_dim": 30, "attention_memory_inference": 50, "attention_memory_training": 50, "attention_position_wise_mlp_dim": 32, "attention_init_gru_gate_bias": 2.0, "attention_use_n_prev_actions": 15, "attention_use_n_prev_rewards": 15, }, } stop = { "training_iteration": args.stop_iters, "episode_reward_mean": args.stop_reward, "timesteps_total": args.stop_timesteps, } results = tune.run(args.run, config=config, stop=stop, verbose=1) if args.as_test: check_learning_achieved(results, args.stop_reward) ray.shutdown() ``` Thanks, Idan

Lars_Simon_Zehnder · July 23, 2021, 4:35pm

Hi @mannyv ,

indeed interesting and up-to-date. Thank you!

I tested Tuple and this also works fine with my setup. So Dict spaces for actions face problems in postprocessing.

Simon

Topic		Replies	Views
[RLlib] Is it possible to change action_space during training? RLlib	1	399	March 22, 2022
[rllib] wrong action dimensions when using dictionary action space RLlib	3	554	July 15, 2021
Action masking error RLlib	9	1671	February 6, 2023
Rllib with Tuple action space RLlib	1	569	December 14, 2022
RLlib and gym.space RLlib	4	701	November 14, 2021

Initial action for Dict action space

Related topics