Actions created by Policy being modified before input to environment

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.


I’m trying to set up a multiagent training scenario where there are some learned policies training alongside with predefined heuristic policies, akin to the examples in “rock_paper_scissors_multiagent” and “multiagent_custom_policy” and “multiagent_different_spaces…”. I’ve created a hand-made policy that looks like it’s working independently, but when I try to incorporate it into training, the actions that are being passed into my environment don’t match up. For example, if the action the policy spits out is (5, -10, 5) the action being passed into my environment would be (10000, -10000, 10000), which corresponds to the box action space bounds that I’ve defined. Any one know what might be the issue?

Using: Ray 2.0.0, Windows, R2D2 as training algorithm



Upon some further testing, it seems this only occurs when I define continuous action spaces. I tested my training set up with the RandomPolicy example, but I’m getting the same issue. I’ve tried specifying the action distribution as “deterministic” from the TF_action_distribtions, but it doesn’t work. Here’s how I set up my Policies in the policy map:

PolicySpec(policy_class=RandomPolicy, observation_space=obs_space_low,
action_space=act_space_low, config={“model”: {“custom_action_dist”: Deterministic}})

Hi @henry_lei,

I think this is the setting you are looking for.

1 Like

That was it, thanks!

looks like this worked.