How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I want to specify different algorithm specific values (gamma and vf_clip_param) for my policies (both using PPO) since they should have different reward shaping and the height of the rewards is very different. However the algorithm only seems to change the values when i’m changing them in the config root. When changed in the multiagent config dict they are not updated. How am i supposed to set those up individually?
My config:
train_config = {
"env": MyEnv,
"env_config": env_config,
"entropy_coeff": 0.01,
# "vf_clip_param": 12,
"multiagent": {
"policies": {
"pol_1": (
None,
Box(
low=0,
high=255,
shape=(41, 41, 2), # shape=(HEIGHT, WIDTH, N_CHANNELS)
dtype=np.uint8,
),
Discrete(2),
{
"gamma": 0.95,
"vf_clip_param": 64,
"model": {
"no_final_linear": True,
"conv_filters": filters,
},
},
),
"pol_2": (
None,
Box(
low=0,
high=255,
shape=(41, 41, 2), # shape=(HEIGHT, WIDTH, N_CHANNELS)
dtype=np.uint8,
),
Discrete(5),
{
"gamma": 0.98,
"vf_clip_param": 20,
"model": {
"no_final_linear": True,
"conv_filters": filters
},
},
),
},
"policy_mapping_fn": policy_mapping_fn,
# Optional list of policies to train, or None for all policies.
"policies_to_train": None,
},
"num_workers": 1,
"framework": "torch",
"log_level": "DEBUG",
"num_cpus_for_driver": 2,
"num_cpus_per_worker": 2,
"num_gpus": 0.5,
"num_gpus_per_worker": 0.5,
"disable_env_checking": True,
"train_batch_size": 4000
}