How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
The problem I’m trying to solve involves agents taking actions that may affect the environment so that observations for the next agent (whose action has already been computed and stored in agent_dict
) are stale. In other words, suppose we are computing the action for agent n. The action taken by agent n impacts the non-agent components of the environment such that the action for agent n+1 was computed using stale observations. What is the proper way to get around this?
Here is a simplified env.step()
function that I am using:
def step(self, action_dict):
obs, rew, terminated, truncated, info = {}, {}, {}, {}, {}
for id, action in action_dict.items():
if (not self.terminateds[id]) and (not self.truncateds[id]):
# here we should actually manually compute actions so we can update the observation for the agents (i.e. current price)
(
obs[id],
rew[id],
terminated[id],
truncated[id],
info[id],
) = self._actual_agents[id].step(action=action)
return obs, rew, terminated, truncated, info
To reiterate, each agent’s action might modify the environment. I want each subsequent agent to take the changes in the environment into account in their observations when computing their action. My concern about manually computing actions around line 5 is that Ray may have already “stored” the computed actions somewhere for the algorithms and if I change the action, the rewards that result from taking the new action will be attributed to the outdated action.
Two questions result:
- In a MultiAgentEnv, what happens with the computed actions in between the time they are computed and passing of the
action_dict
toenv.step()
? - How would one go about updating the observations of each agent so that policies learn correctly and the algorithms associated the returned rewards with the correct input observations and action? To use the language of PettingZoo, what I would like to do is understand better how to implement an Agent-Environment Cycle (instead of a Parallel environment) using a pure Ray
MultiAgentEnv
(i.e., without using Ray’s PettingZoo AECEnv wrapper.)