Setting up multiagent config dict with different algorithm parameters

Blubberblub · December 10, 2022, 2:05pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I want to specify different algorithm specific values (gamma and vf_clip_param) for my policies (both using PPO) since they should have different reward shaping and the height of the rewards is very different. However the algorithm only seems to change the values when i’m changing them in the config root. When changed in the multiagent config dict they are not updated. How am i supposed to set those up individually?

My config:

train_config = {
    "env": MyEnv,
    "env_config": env_config,
    "entropy_coeff": 0.01,
    # "vf_clip_param": 12,
    "multiagent": {
        "policies": {
            "pol_1": (
                None,
                Box(
                    low=0,
                    high=255,
                    shape=(41, 41, 2),  # shape=(HEIGHT, WIDTH, N_CHANNELS)
                    dtype=np.uint8,
                ),
                Discrete(2),
                { 
                    "gamma": 0.95,
                    "vf_clip_param": 64,
                    "model": {
                        "no_final_linear": True,
                        "conv_filters": filters,
                    },
                },
            ),
            "pol_2": (
                None,
                Box(
                    low=0,
                    high=255,
                    shape=(41, 41, 2),  # shape=(HEIGHT, WIDTH, N_CHANNELS)
                    dtype=np.uint8,
                ),
                Discrete(5),
                {
                    "gamma": 0.98,
                    "vf_clip_param": 20,
                    "model": {
                        "no_final_linear": True,
                        "conv_filters": filters
                    },
                },
            ),
        },
        "policy_mapping_fn": policy_mapping_fn,
        # Optional list of policies to train, or None for all policies.
        "policies_to_train": None,
    },
    "num_workers": 1,
    "framework": "torch",
    "log_level": "DEBUG",
    "num_cpus_for_driver": 2,
    "num_cpus_per_worker": 2,
    "num_gpus": 0.5,
    "num_gpus_per_worker": 0.5,
    "disable_env_checking": True,
    "train_batch_size": 4000
}

arturn · December 15, 2022, 9:02am

Hi @Blubberblub ,

Before I look into this, could you try this with Ray 2.2?

Cheers

Blubberblub · December 16, 2022, 11:06am

@arturn Yes, i will upgrade next week and check again.

Topic		Replies	Views
Help with ppo config in multiagent env with complex observations Configure Algorithm, Training, Evaluation, Scaling	0	41	April 11, 2025
Can't understand training config Configure Algorithm, Training, Evaluation, Scaling	2	33	July 30, 2024
Failing at configuring a multi-agent trainer RLlib	0	43	December 20, 2024
Which parameters are required in minimal Multi-Agent Training Configure Algorithm, Training, Evaluation, Scaling	2	55	February 25, 2025
Customize DQN policy in two-trainer multiagent example RLlib	4	385	September 20, 2022

Setting up multiagent config dict with different algorithm parameters

Related topics