GPU utilization is only 1%

wangjunhe8127 · October 19, 2022, 12:03pm

Hi, I have some quesetion.
the config is

config = {
    "env":"nuplan",
    "env_config":None,
    "num_workers":30,
    # "record_env":False,
    "create_env_on_driver": False,
    "num_envs_per_worker":1,
    "remote_worker_envs": False,
    "num_gpus": 8,
    "num_cpus_per_worker":1,
    "num_gpus_per_worker":0,
    "framework":"torch",
    "model":{
                    "fcnet_hiddens": [512, 512,512,5123],
                   },
    "timesteps_per_iteration":200,
    # "sample_async": True,
    "horizon": 600,
    'rollout_fragment_length':4,#4*30=120
    'train_batch_size':24,
    'replay_buffer_config': 
        {
        "_enable_replay_buffer_api": True,
        "type": "MultiAgentReplayBuffer",
        "learning_starts": 10,
        "capacity": 50000,
        "replay_sequence_length": 1,
        },
    # "training_intensity"=10  train/collect
    "batch_mode": "truncate_episodes",  # 也可以设置 "complete_episodes"  truncate_episodes
    }
    
pbt = PopulationBasedTraining(
    time_attr="time_total_s",
    perturbation_interval=7200,
    resample_probability=0.25,
    hyperparam_mutations={
    "lr": lambda: random.uniform(1e-3, 5e-5),
    "gamma": lambda: random.uniform(0.90, 0.99),
    },)

and use it by :

if __name__ == "__main__":
    ray.init(num_gpus=8)
    tune.run(
        "DQN", 
        config = cfg,
        scheduler = pbt,
        num_samples = 1,
        metric = "episode_reward_mean",
        mode = "max",
        local_dir = "./results",)

but when i see the GPU util,is only 1% in a GPU and sometimes is always 0, Please help me! Thank you!
the version is 2.0.0 torch==1.9.0

wangjunhe8127 · October 21, 2022, 2:42am

can you help me?
emmmmmmmmm

wangjunhe8127 · October 21, 2022, 7:53am

can somebody can help me…

arturn · October 21, 2022, 8:57pm

Please make this a reproducible script.

kourosh · October 21, 2022, 8:59pm

@wangjunhe8127 Hello, GPU utilization really depends on your workload. You are doing DQN with train_batch_size of 24 which is pretty small, your network is also just an FCNet with 4 layers, so I don’t expect the utilization to be that high anyways. Also you need to set num_gpu to 1 unless you are doing multi-gpu training per tune trial which I don’t think is the case here. Increasing batch_size should result in higher gpu utilization during training. Sampling would be done on cpu workers so during sampling utilization should go down. I hope it helps.

wangjunhe8127 · October 23, 2022, 8:18am

Thank you very much for your reply. However, in fact, I want to use multi GPU training, so why do you suggest that num_ gpus=1 ?

wangjunhe8127 · October 23, 2022, 8:25am

import ray
from ray import tune
from ray.tune.schedulers import PopulationBasedTraining

config = {
    "env":"xxx",
    "env_config":None,
    "num_workers":7,
    "create_env_on_driver": False,
    "num_envs_per_worker":1,
    "remote_worker_envs": False,
    "num_gpus": 3,
    "num_cpus_per_worker":1,
    "num_gpus_per_worker":0,
    "framework":"torch",
    "learning_starts": 20,
    "placement_strategy": "SPREAD",
    "model":{
            "fcnet_hiddens": [512, 512,512,5123],},
    "timesteps_per_iteration":200,
    "horizon": 600,
    'rollout_fragment_length':4,#4*30=120
    'train_batch_size':24,
    'replay_buffer_config': 
        {

        "type": "MultiAgentReplayBuffer",
        "capacity": 50000,
        },
    "batch_mode": "truncate_episodes", 
    }
    
pbt = PopulationBasedTraining(
    time_attr="time_total_s",
    perturbation_interval=7200,
    resample_probability=0.25,
    hyperparam_mutations={
    "lr": lambda: random.uniform(1e-3, 5e-5),
    "gamma": lambda: random.uniform(0.90, 0.99),
    },)

if __name__ == "__main__":
    ray.init()
    tune.run(
        "DQN", 
        config = config,
        num_samples =4,
        metric = "episode_reward_mean",
        mode = "max",
        local_dir = "./results",
        )
    ray.shutdown()

Thank you! The above is all the settings except the environment， and the version of ray is 1.10.0

arturn · October 23, 2022, 5:27pm

Got it. Before I have an i depth look at this, is there a particular reason you are using 1.10? The content of the repro script looks like you could just as well use 2.0. Have you tried that?

wangjunhe8127 · October 23, 2022, 6:48pm

Thanks！ Beacuse our tool of ws2 is only support v1.10.0，Do you mean it may be a version problem？

arturn · October 25, 2022, 8:57pm

RLlib has a lot of moving parts and many things have changed since 1.10.0. I can’t think a particular part that may cause this error though.

You use a very small batch size as @kourosh said.

Here’s a more extensive explanation:
The memory usage of your GPUs looks just fine for what you are doing. But RLlib will split up the training batch (which is already extremely small) between your GPUs. So every GPU will calculate the SGD step for only 3 samples (batch_size=24, num_gpus=8). So for every iteration of the algorithm, you sample (which takes time) and then run very very little data through those 8 GPUs - hence the super small usage.

wangjunhe8127 · November 21, 2022, 7:16am

Ok,Thank you~~~~~~~~~~~~~~~~~~

Topic		Replies	Views
Does ChatGPT suggests correct config for 1 gpu and 72 cpus? RLlib	1	48	November 18, 2024
Training trials in parallel on multi-gpu machine Ray Tune	8	1719	August 23, 2021
GPU accelarate that can not be used with ray and tune in training PPO RLlib	3	881	December 23, 2023
How (if possible) do I allocate more GPU utilization to Ray?	0	347	September 14, 2022
Questions about using GPU for the ray[rllib] RLlib	4	2096	August 4, 2023

GPU utilization is only 1%

Related topics