Random Exploration error with OrderedDict

Edit: Hadn’t used dm-tree/map_structure until now but it’s sorting the dict, which is creating the problem. It seems this was discussed already won’t be changed within tree. Is it my best option at this point to concede to the forced sorting?




Haven’t quite figured this one out yet. I have a Dict containing 1 Box and 5 Discrete actions. While trying to make sure the gym.Dict’s order is constant by using an OrderedDict, I’ve encountered the following error when the Box action space is not the first element of the action space:

tf_policy_template.py 238 __init__
DynamicTFPolicy.__init__(

dynamic_tf_policy.py 325 __init__
self.exploration.get_exploration_action(

stochastic_sampling.py 74 get_exploration_action
return self._get_tf_exploration_action_op(action_distribution,

stochastic_sampling.py 80 _get_tf_exploration_action_op
stochastic_actions = tf.cond(

traceback_utils.py 153 error_handler
raise e.with_traceback(filtered_tb) from None

stochastic_sampling.py 83 <lambda>
self.random_exploration.get_tf_exploration_action_op(

random.py 131 get_tf_exploration_action_op
action = tf.cond(

ValueError:
Outputs of 'true_fn' and 'false_fn' must have the same type(s). Received float32 from 'true_fn' and int64 from 'false_fn'.

Here’s the value of true_fn when it works, doesn’t work, and the action space I’m attempting to make work.

Works
{
  'am': <tf.Tensor 'default_policy/cond/cond/random_uniform:0' shape=(?, 1) dtype=float32>,
  'c0': <tf.Tensor 'default_policy/cond/cond/random_uniform_1:0' shape=(?,) dtype=int64>,
  'c1': <tf.Tensor 'default_policy/cond/cond/random_uniform_2:0' shape=(?,) dtype=int64>,
  'c2': <tf.Tensor 'default_policy/cond/cond/random_uniform_3:0' shape=(?,) dtype=int64>,
  'c3': <tf.Tensor 'default_policy/cond/cond/random_uniform_4:0' shape=(?,) dtype=int64>,
  'ho': <tf.Tensor 'default_policy/cond/cond/random_uniform_5:0' shape=(?,) dtype=int64>
}

Crash
{
  'c0': <tf.Tensor 'default_policy/cond/cond/random_uniform_1:0' shape=(?,) dtype=int64>,
  'c1': <tf.Tensor 'default_policy/cond/cond/random_uniform_2:0' shape=(?,) dtype=int64>,
  'c2': <tf.Tensor 'default_policy/cond/cond/random_uniform_3:0' shape=(?,) dtype=int64>,
  'c3': <tf.Tensor 'default_policy/cond/cond/random_uniform_4:0' shape=(?,) dtype=int64>,
  'ho': <tf.Tensor 'default_policy/cond/cond/random_uniform_5:0' shape=(?,) dtype=int64>,
  'am': <tf.Tensor 'default_policy/cond/cond/random_uniform:0' shape=(?, 1) dtype=float32>
}

Action Space
Dict(
  am([2.], [1000.], (1,), float32), 
  c0:Discrete(2), 
  c1:Discrete(2), 
  c2:Discrete(2), 
  c3:Discrete(2), 
  ho:Discrete(2)
)

I just noticed the order of the tensor operations while making this post, which is probably the problem. I guess I’ll do some more digging, but any help would be greatly appreciated. Been banging my head against this for a bit.