How severe does this issue affect your experience of using Ray?
Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi all.
First post, still learning. I have read a lot of documentation, code, forum posts and github issues. Posting here is last resort, I’m really hoping you can point me in the right direction.
I’m trying to train a single ‘leader’ policy that will coordinate a variable number of dumb agents to achieve a goal. The agents only do what the ‘leader’ commands, they have no internal intelligence or logic. Some agents have the same goal, but they do not collaborate - each agent reaches its goal individually. However there is a set of constraints that the agents must not violate, some of which relate to the state of the other agents. The policy will need to look ahead a couple of steps to make sure that this does not happen.
That’s why the action space contains a variable-length list of actions for each agent, which the environment will apply sequentially, one for each planned future step, checking that no constraints will be violated if these are executed as planned. The observation space contains the end result of applying the actions in sequence, as well as a list of the outcome of each action.
My env has action and observation spaces that look something like this:
self.action_space = Dict({
agent_id: Repeated(child_space=Dict({
"param_1": Box(low=-18, high=+18, shape=(1,), dtype=np.int8),
"param_2": Box(low=0, high=30, dtype=np.int16)
}, max_len=10))
})
for agent_id in self.agents
})
self.observation_space = Dict({
agent_id: Dict({
"obs_1": Box(low=-200, high=+200, shape=(2,), dtype=np.float16),
"obs_2": Box(low=-18, high=+18, shape=(1,), dtype=np.int8),
"execution_log": Repeated(Dict({
"obs_3": Box(low=-200, high=+200, shape=(2,), dtype=np.float16),
"obs_4": Box(low=-18, high=+18, shape=(1,), dtype=np.int8)
}), max_len=300)
})
for agent_id in self.agents
})
Some dict key names have been changed to protect the innocent.
I’m using the APPO algorithm, something like this:
tuner = tune.Tuner("APPO",
run_config=air.RunConfig(
stop={"training_iteration": 10},
verbose=AirVerbosity.DEFAULT,
progress_reporter=reporter,
storage_path="%s/build/ray_results" % CWD,
name="training_log",
checkpoint_config=air.CheckpointConfig(
checkpoint_frequency=10,
num_to_keep=10,
checkpoint_at_end=True)),
param_space=param_space)
results_grid = tuner.fit()
When I run this I get the following error output:
NotImplementedError: Unsupported args: Repeated(Dict('param_1': Box(0, 30, (1,), int16), 'param_2': Box(-18, 18, (1,), int8)), 10) None
Full stack trace
NotImplementedError: Unsupported args: Repeated(Dict('param_2': Box(0, 30, (1,), int16), 'param_1': Box(-18, 18, (1,), int8)), 10) None
Exception raised in creation task: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=12387, ip=192.168.111.128, actor_id=232a3a67c6b776f4c214377001000000, repr=<ray.rllib.evaluation.rollout_worker._modify_class.<locals>.Class object at 0x7f974042a0b0>)
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 525, in __init__
self._update_policy_map(policy_dict=self.policy_dict)
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1727, in _update_policy_map
self._build_policy_map(
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1838, in _build_policy_map
new_policy = create_policy_for_framework(
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/ray/rllib/utils/policy.py", line 142, in create_policy_for_framework
return policy_class(observation_space, action_space, merged_config)
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/ray/rllib/algorithms/appo/appo_torch_policy.py", line 84, in __init__
TorchPolicyV2.__init__(
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 96, in __init__
model, dist_class = self._init_model_and_dist_class()
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 516, in _init_model_and_dist_class
model = self.make_model()
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/ray/rllib/algorithms/appo/appo_torch_policy.py", line 109, in make_model
return make_appo_models(self)
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/ray/rllib/algorithms/appo/utils.py", line 19, in make_appo_models
_, logit_dim = ModelCatalog.get_action_dist(
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/ray/rllib/models/catalog.py", line 322, in get_action_dist
return ModelCatalog._get_multi_action_distribution(
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/ray/rllib/models/catalog.py", line 923, in _get_multi_action_distribution
child_dists_and_in_lens = tree.map_structure(
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/tree/__init__.py", line 435, in map_structure
[func(*args) for args in zip(*map(flatten, structures))])
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/tree/__init__.py", line 435, in <listcomp>
[func(*args) for args in zip(*map(flatten, structures))])
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/ray/rllib/models/catalog.py", line 924, in <lambda>
lambda s: ModelCatalog.get_action_dist(s, config, framework=framework),
File "/home/adamcc/leader/venv/lib/python3.10/site-packages/ray/rllib/models/catalog.py", line 350, in get_action_dist
raise NotImplementedError(
NotImplementedError: Unsupported args: Repeated(Dict('param_1': Box(0, 30, (1,), int16), 'param_2': Box(-18, 18, (1,), int8)), 10) None
It seems that the model is not recognising the Repeated
space: I don’t see it in the referenced get_action_dist()
method of catalog.py
.
To get around this I replaced the Repeated
space in the action_space with a simple Dict:
self.action_space = Dict({
agent_id: Dict({
i: Dict({
"param_1": Box(low=-18, high=+18, shape=(1,), dtype=np.int8),
"param_2": Box(low=0, high=30, dtype=np.int16)
})
for i in range(0, 10)
})
for agent_id in self.agents
})
This is obviously a fugly hack, and it’s clearly not right, because it will always require 10 actions. In reality the policy will be rewarded for achieving the goal with fewer actions - highest reward for a single action per agent. The policy should only generate enough actions to reach the goal.
Thanks in advance for any hints.
Yours,
Adam