How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task.
Hi,
I’m trying to set up a multiagent training scenario where there are some learned policies training alongside with predefined heuristic policies, akin to the examples in “rock_paper_scissors_multiagent” and “multiagent_custom_policy” and “multiagent_different_spaces…”. I’ve created a hand-made policy that looks like it’s working independently, but when I try to incorporate it into training, the actions that are being passed into my environment don’t match up. For example, if the action the policy spits out is (5, -10, 5) the action being passed into my environment would be (10000, -10000, 10000), which corresponds to the box action space bounds that I’ve defined. Any one know what might be the issue?
Using: Ray 2.0.0, Windows, R2D2 as training algorithm
Upon some further testing, it seems this only occurs when I define continuous action spaces. I tested my training set up with the RandomPolicy example, but I’m getting the same issue. I’ve tried specifying the action distribution as “deterministic” from the TF_action_distribtions, but it doesn’t work. Here’s how I set up my Policies in the policy map: