Random Exploration error with OrderedDict

runedog48 · December 1, 2021, 2:14am

Edit: Hadn’t used dm-tree/map_structure until now but it’s sorting the dict, which is creating the problem. It seems this was discussed already won’t be changed within tree. Is it my best option at this point to concede to the forced sorting?
‎
‎
‎
‎
Haven’t quite figured this one out yet. I have a Dict containing 1 Box and 5 Discrete actions. While trying to make sure the gym.Dict’s order is constant by using an OrderedDict, I’ve encountered the following error when the Box action space is not the first element of the action space:

tf_policy_template.py 238 __init__
DynamicTFPolicy.__init__(

dynamic_tf_policy.py 325 __init__
self.exploration.get_exploration_action(

stochastic_sampling.py 74 get_exploration_action
return self._get_tf_exploration_action_op(action_distribution,

stochastic_sampling.py 80 _get_tf_exploration_action_op
stochastic_actions = tf.cond(

traceback_utils.py 153 error_handler
raise e.with_traceback(filtered_tb) from None

stochastic_sampling.py 83 <lambda>
self.random_exploration.get_tf_exploration_action_op(

random.py 131 get_tf_exploration_action_op
action = tf.cond(

ValueError:
Outputs of 'true_fn' and 'false_fn' must have the same type(s). Received float32 from 'true_fn' and int64 from 'false_fn'.

Here’s the value of true_fn when it works, doesn’t work, and the action space I’m attempting to make work.

Works
{
  'am': <tf.Tensor 'default_policy/cond/cond/random_uniform:0' shape=(?, 1) dtype=float32>,
  'c0': <tf.Tensor 'default_policy/cond/cond/random_uniform_1:0' shape=(?,) dtype=int64>,
  'c1': <tf.Tensor 'default_policy/cond/cond/random_uniform_2:0' shape=(?,) dtype=int64>,
  'c2': <tf.Tensor 'default_policy/cond/cond/random_uniform_3:0' shape=(?,) dtype=int64>,
  'c3': <tf.Tensor 'default_policy/cond/cond/random_uniform_4:0' shape=(?,) dtype=int64>,
  'ho': <tf.Tensor 'default_policy/cond/cond/random_uniform_5:0' shape=(?,) dtype=int64>
}

Crash
{
  'c0': <tf.Tensor 'default_policy/cond/cond/random_uniform_1:0' shape=(?,) dtype=int64>,
  'c1': <tf.Tensor 'default_policy/cond/cond/random_uniform_2:0' shape=(?,) dtype=int64>,
  'c2': <tf.Tensor 'default_policy/cond/cond/random_uniform_3:0' shape=(?,) dtype=int64>,
  'c3': <tf.Tensor 'default_policy/cond/cond/random_uniform_4:0' shape=(?,) dtype=int64>,
  'ho': <tf.Tensor 'default_policy/cond/cond/random_uniform_5:0' shape=(?,) dtype=int64>,
  'am': <tf.Tensor 'default_policy/cond/cond/random_uniform:0' shape=(?, 1) dtype=float32>
}

Action Space
Dict(
  am([2.], [1000.], (1,), float32), 
  c0:Discrete(2), 
  c1:Discrete(2), 
  c2:Discrete(2), 
  c3:Discrete(2), 
  ho:Discrete(2)
)

I just noticed the order of the tensor operations while making this post, which is probably the problem. I guess I’ll do some more digging, but any help would be greatly appreciated. Been banging my head against this for a bit.

Topic		Replies	Views
Initial action for Dict action space RLlib	5	1092	July 23, 2021
Distribution wrapper with dict space RLlib	2	221	May 26, 2021
CustomEnviorement AssertionError RLlib	1	482	January 8, 2023
Trainer.compute_action Error with Dict type observation inputs RLlib	4	837	December 12, 2020
Use integer box space with tensorflow RLlib	6	744	March 31, 2021

Random Exploration error with OrderedDict

Related Topics