ValueError: Outputs of true_fn and false_fn must have the same type: float64, float32

Hello,

Our action space is currently like this:
Tuple(Box(high=[inf], low=[0], shape=(1,), dtype=np.float64), Discrete(2))

When migrating from ray 0.8.7 to 1.0.1, I get the following error on a PPO run with a custom action distribution:

ray.exceptions.RayTaskError(ValueError): ray::RolloutWorker.foreach_policy() (pid=12422, ip=192.168.0.81)
  File "python/ray/_raylet.pyx", line 443, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 477, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 481, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 482, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 436, in ray._raylet.execute_task.function_executor
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 454, in __init__
    self._build_policy_map(policy_dict, policy_config)
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1059, in _build_policy_map
    policy_map[name] = cls(obs_space, act_space, merged_conf)
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/policy/tf_policy_template.py", line 206, in __init__
    DynamicTFPolicy.__init__(
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 258, in __init__
    self.exploration.get_exploration_action(
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/utils/exploration/stochastic_sampling.py", line 72, in get_exploration_action
    return self._get_tf_exploration_action_op(action_distribution,
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/utils/exploration/stochastic_sampling.py", line 78, in _get_tf_exploration_action_op
    stochastic_actions = tf.cond(
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1392, in cond_for_tf_v2
    return cond(pred, true_fn=true_fn, false_fn=false_fn, strict=True, name=name)
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1227, in cond
    orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1064, in BuildCondBranch
    original_result = fn()
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/utils/exploration/stochastic_sampling.py", line 81, in <lambda>
    self.random_exploration.get_tf_exploration_action_op(
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/utils/exploration/random.py", line 107, in get_tf_exploration_action_op
    action = tf.cond(
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1392, in cond_for_tf_v2
    return cond(pred, true_fn=true_fn, false_fn=false_fn, strict=True, name=name)
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1275, in cond
    raise ValueError(
ValueError: Outputs of true_fn and false_fn must have the same type: float64, float32

It has been fixed by setting the first item of the action space to be of type np.float32. But I was wondering if it wasn’t supported for a specific reason?

Thanks

cc @sven1977 do we enforce type casting somewhere?

1 Like