Hello,
Our action space is currently like this:
Tuple(Box(high=[inf], low=[0], shape=(1,), dtype=np.float64), Discrete(2))
When migrating from ray 0.8.7 to 1.0.1, I get the following error on a PPO run with a custom action distribution:
ray.exceptions.RayTaskError(ValueError): ray::RolloutWorker.foreach_policy() (pid=12422, ip=192.168.0.81)
File "python/ray/_raylet.pyx", line 443, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 477, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 481, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 482, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 436, in ray._raylet.execute_task.function_executor
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 454, in __init__
self._build_policy_map(policy_dict, policy_config)
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1059, in _build_policy_map
policy_map[name] = cls(obs_space, act_space, merged_conf)
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/policy/tf_policy_template.py", line 206, in __init__
DynamicTFPolicy.__init__(
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 258, in __init__
self.exploration.get_exploration_action(
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/utils/exploration/stochastic_sampling.py", line 72, in get_exploration_action
return self._get_tf_exploration_action_op(action_distribution,
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/utils/exploration/stochastic_sampling.py", line 78, in _get_tf_exploration_action_op
stochastic_actions = tf.cond(
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1392, in cond_for_tf_v2
return cond(pred, true_fn=true_fn, false_fn=false_fn, strict=True, name=name)
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1227, in cond
orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1064, in BuildCondBranch
original_result = fn()
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/utils/exploration/stochastic_sampling.py", line 81, in <lambda>
self.random_exploration.get_tf_exploration_action_op(
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/ray/rllib/utils/exploration/random.py", line 107, in get_tf_exploration_action_op
action = tf.cond(
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1392, in cond_for_tf_v2
return cond(pred, true_fn=true_fn, false_fn=false_fn, strict=True, name=name)
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/faten/anaconda3/envs/X/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1275, in cond
raise ValueError(
ValueError: Outputs of true_fn and false_fn must have the same type: float64, float32
It has been fixed by setting the first item of the action space to be of type np.float32. But I was wondering if it wasn’t supported for a specific reason?
Thanks