Hi Tyler,
You understood correctly, and I appreciate your response. However, as a beginner with RLlib, I’m still having trouble understanding how to handle asynchrony between agents.
Specifically, Agent 1 becomes available at time t=1, but Agent 2 isn’t yet free to make a decision. Then, at t=2, Agent 2 becomes available while Agent 1 is still occupied.
I’ve previously worked with synchronous multi-agent setups, where both agents make decisions at the same time. However, I’m unclear about what the state dictionary should contain in an asynchronous setup. Does each agent need to return its state at time t, even if it hasn’t acted? And what about rewards?
Here’s my config dictionary in case it might help:
config = {
"multiagent": {
"policies": {
"agent_1_policy": (None, cd.observation_space['agent_1'], cd.action_space['agent_1'], {"model": {"fcnet_hiddens": [16, 16]}}),
"agent_2_policy": (None, cd.observation_space['agent_2'], cd.action_space['agent_2'], {"model": {"fcnet_hiddens": [16, 16]}}),
},
"policy_mapping_fn": policy_mapping_fn,
},
"env": ABiCi_env,
"env_config": env_config,
"exploration_config": {
"type": "StochasticSampling"
},
"lr": learning_rate,
"gamma": discount_factor
}
And here’s my policy_mapping_fn
:
def policy_mapping_fn(agent_id, episode, worker, **kwargs):
if agent_id == 'agent_1':
return 'agent_1_policy'
elif agent_id == 'agent_2':
return 'agent_2_policy'
else:
raise ValueError(f"Invalid agent ID: {agent_id}")
Thank you again!
L.E.O.