Train PPO on Cluster with multiple nodes

Magnus411 · January 24, 2025, 1:16am

Hello! Im training a PPO model to control traffic lights to optimize flow. I have set up my network and everything works, but i cannot figure out how to set up the cluster and make every node actually do the work.

Im having real trouble finding resources on it.

So i have set up a head:
ray start --head --port=6379 --num-cpus=14 --num-gpus=1 --dashboard-host=10.10.20.49

started up my nodes:
ray start --address="10.10.20.49:6379" --num-cpus=14 --num-gpus=1

ray job submit --address="http://10.10.20.49:8265" --runtime-env-json='{"working_dir": "/traffic"}' -- python cluster2.py

I can see everything running, but i can only see that it uses resources from the head node where i submitted the task.

I can see:
Logical resource usage: 7.0/56 CPUs, 1.0/4 GPUs (0.0/4.0 accelerator_type:G)
, but thats only using the resources of my head node.

In my python file i have defined:

ray.init(address="auto")  
config = training_config()

tuner = Tuner(
            "PPO",
            param_space=config,
            run_config=RunConfig(
                storage_path=storage_path,
                stop={"training_iteration": 800}, 
                checkpoint_config=CheckpointConfig(
                    checkpoint_frequency=3, 
                    checkpoint_at_end=True,
                ),
                callbacks=[
                    WandbLoggerCallback(
                        project="Traffic", api_key="f16cce37f90a1ffe6dc3741f0f86df10cc04baed", log_config=True
                )]
            ),
        )

def training_config():
    ModelCatalog.register_custom_model("my_traffic_model", TrafficLightModel)

    config = (
        PPOConfig()
        .api_stack(
            enable_rl_module_and_learner=False,
            enable_env_runner_and_connector_v2=False,
        )
        .environment(
            env=TrafficLightEnv,
            env_config={
                "num_lights": 5,
                "sumo_cfg": r"sharedDir/osm.sumocfg",
                "max_steps": 18000,
            },
        )
        .framework("torch")
        .resources(num_gpus=1)
        .env_runners(
            num_env_runners=6,
            num_envs_per_env_runner=1,
            rollout_fragment_length="auto",
            sample_timeout_s=320,
        )
        .training(
            train_batch_size=16000,
            num_epochs=16,
            gamma=0.99,
            lr=1e-4,
            lambda_=0.95,
            clip_param=0.2,
            vf_clip_param=20.0,
            entropy_coeff=0.01,
            kl_coeff=0.1,
            grad_clip=1.0,
        )
        .reporting(
            metrics_num_episodes_for_smoothing=100,
            keep_per_episode_custom_metrics=True,
            min_sample_timesteps_per_iteration=1000,
            min_time_s_per_iteration=1,
            metrics_episode_collection_timeout_s=120,
        )
        .debugging(
            logger_config={
                "type": "ray.tune.logger.TBXLogger",
                "logdir": "./logs",
                "log_metrics_tables": True,
            },
        )
    )

     config.training(
            model={
                "custom_model": "my_traffic_model",
            }
        )

    return config

From what I have gathered resources and env_runners are “local” so what each Node is going to run.

This works to run, but its only using the resources of the head node that submitted it.

I have searched all over the place and i can see things about using autoscaling, @ray.remote, placement groups, but i cannot make sense of it. What should i do here? How can i train the model across the cluster?

mannyv · January 24, 2025, 3:53pm

Hi @Magnus411,

You are only asking for 7 workers. 6 env runners and 1 trainer. If you change num_env_runners to 14 you will see one process start on another node.

You could also use tune to search over hyperparameters then you would see 4 trials run at a time. One per gpu each using 7 processes.

Magnus411 · January 24, 2025, 4:30pm

Thanks for the answer @mannyv ! Okay. How do i define more trainers then? I have seen something about using .learners, but that is Alpha I think and is not supported on the old stack.

I have tried setting resources(num_gpus=4) to all my 4 gpus, but then i get an error saying i dont have enough gpus, i figured that resources was was local use?

And how do i use tune to search over hyperparameters to get all the gpus working?

I was trying again last night after the post and I managed to utilize more of the CPUs by setting the env_runners higher. But I was still not able to utilize the 4 gpus. The ray Dashboard said something about : {cpus: 24} provistioned, {cpus: 1, gpus:1} provisioned}

Topic		Replies	Views
Run PPO on multiple nodes RLlib	1	592	September 4, 2022
Multiple GPU head node on GCP Ray Clusters	3	568	April 25, 2022
How can i run two PPO in parallel or in sequence? RLlib	2	372	October 10, 2022
Collect samples on a remote server train on local RLlib	5	688	April 27, 2021
Troubles setting up a Ray Cluster on the Google Cloud Platform (GCP) Ray Core	2	552	March 3, 2021

Train PPO on Cluster with multiple nodes

Related topics