How to reduce GPU memory consumption overhead of actor workers

Henry_Bi · May 26, 2021, 8:13pm

I am currently using ray[default] 1.3.0 with pytorch 1.3.1 to implement multi-agent reinforcement learning with each agent using one ray actor and all agents sharing a GPU. I encountered a problem that no matter how small the neural network or the batch size is, each worker always takes at least 1GB GPU memory. It seems to be a waste of resources, how can I avoid this?

kai · May 27, 2021, 7:19am

Can you share a bit more context? How does your configuration look like? How does the training code look like? What kind of GPU are you using?

Also cc @sven1977 for rllib and @ericl for resource allocation

Henry_Bi · May 27, 2021, 7:32am

Thank you for your reply! I am using Nvidia Tesla V100 GPUs with CUDA 11.1. The actor class looks like

@ray.remote(num_gpus = 1/8, num_cpus=1)
class Worker(object):
    """
    A ray actor wrapper class for multiprocessing
    """
    def __init__(self, agent_fn, device, **args):
        self.device = torch.device(device)
        self.instance = agent_fn(**args).to(self.device)

    def roll(self, **data):
        return self.instance.roll(**data)

    def updateP(self, **data):
        return self.instance.updateP(**data)

    def updateQ(self, **data):
        self.instance.updateQ(**data)

    def _evalQ(self, **data):
        return self.instance._evalQ(**data)

    def updatePi(self, **data):
        self.instance.updatePi(**data) 

    def act(self, s, deterministic=False, output_distribution=False):
        return self.instance.act(s, deterministic, output_distribution)

And the ray initialization config is

os.environ['RAY_OBJECT_STORE_ALLOW_SLOW_STORAGE']='1'
ray.init(ignore_reinit_error = True, num_gpus=1)

The memory is consumed immediately after the workers are initialized, so I assume the training code is irrelevant.

    self.agents = []
    for i in range(n_agent):
        agent = Worker.remote(agent_fn=agent_fn, device=device, logger = logger.child(f"{i}"), env=env, **agent_args)
        self.agents.append(agent)

Also, when I run the same code on Nvidia P100 GPUs with CUDA 10.2, the overhead is reduced to about 700MiB from 1085MiB with V100 and CUDA 11.1.

Topic		Replies	Views
Ray Actor not utilising GPU Ray Core	7	223	November 6, 2024
Workaround for GPU-workers non-equal memory consumption Ray Train	7	470	June 1, 2022
Does ray load the CUDA context multiple times? Ray Core	3	554	October 16, 2022
Ray job running with flash_attn cost triple GPU memory than run direct Configure Algorithm, Training, Evaluation, Scaling	1	27	October 24, 2024
PPO with PyTorch GPU has a RAM memory leak for Ray 1.6.0 RLlib	5	667	October 5, 2021

How to reduce GPU memory consumption overhead of actor workers

Related topics