Example of A3C only use CPU for trainer

When I follow the tutorial at RLlib: Scalable Reinforcement Learning — Ray v1.4.1

running

from ray import tune
from ray.rllib.agents.ppo import PPOTrainer
tune.run(PPOTrainer, config={"env": "CartPole-v0"}) 

and

rllib train --run=PPO --env=CartPole-v0  # -v [-vv] for verbose,
                                         # --config='{"framework": "tf2", "eager_tracing": True}' for eager,
                                         # --torch to use PyTorch OR --config='{"framework": "torch"}'

can only use CPU for training, is there any configuration setting that I am missing for GPU training?

Hi @daniel,

Try this and see if it works:

rllib train --run=PPO --env=CartPole-v0  --ray-num-gpus=1 --config='{"num_gpus": 1}'

In the case of rllib/agents/a3c/tests/test_a3c.py when I change the test example to

    def test_a3c_compilation(self):
        """Test whether an A3CTrainer can be built with both frameworks."""
        config = a3c.DEFAULT_CONFIG.copy()
        config["num_workers"] = 2
        config["num_envs_per_worker"] = 2
        config["framework"] = "torch"
        config["num_gpus"] = 1

        num_iterations = 100

        # Test against all frameworks.
        for _ in framework_iterator(config):
            # for env in ["CartPole-v0", "Pendulum-v0", "PongDeterministic-v0"]:
            for env in ["CartPole-v0", "Pendulum-v0", "PongDeterministic-v0"]:

                print("env={}".format(env))
                # sys.stdout.write("world\n")
                trainer = a3c.A3CTrainer(config=config, env=env)
                for i in tqdm(range(num_iterations)):
                    results = trainer.train()
                    # print(results)
                # check_compute_single_action(trainer)
                trainer.stop()

I return me something like

  else:
            logger.info("TorchPolicy (worker={}) running on {} GPU(s).".format(
                worker_idx if worker_idx > 0 else "local", config["num_gpus"]))
            gpu_ids = ray.get_gpu_ids()
            self.devices = [
                torch.device("cuda:{}".format(i))
                for i, id_ in enumerate(gpu_ids) if i < config["num_gpus"]
            ]
>           self.device = self.devices[0]
E           IndexError: list index out of range

It seems that the ray could not locate the available GPU in this case.
How should we use GPU for training A3C?

@daniel,

Try changing this part to:

    def setUp(self):
        ray.init(num_cpus=4, num_gpus=1)

There have been some bugs reported about detecting gpus in torch. You could search the issues in github if this does not work.

After changing this line, I still get the same error as above, is there any other solution I can try?
Thanks!

Hi @daniel,

I think the first thing to do, and maybe you have already done this is to test that pytorch is able to see your gpu in the python interpreter from the (venv, conda environment, container, …) you run ray in.

Something like:

import torch
print(torch.cuda.is_available(), torch.cuda.current_device(), torch.cuda.device(0), torch.cuda.device_count(), torch.cuda.get_device_name(0))

Here is the returned information

True 0 <torch.cuda.device object at 0x7f3070e976a0> 2 NVIDIA RTX 3090

Ok then lets wrap it in a ray remote function call. Try this:

import ray

ray.init(num_gpus=1)


@ray.remote(num_gpus=1)
def test_torchgpu():
    import torch
    print(torch.cuda.is_available(), torch.cuda.current_device(), torch.cuda.device(0), torch.cuda.device_count(), torch.cuda.get_device_name(0))

ray.get(test_torchgpu.remote())
print("ray.get_gpu_ids(): ", ray.get_gpu_ids())
ray.shutdown()

It returns the similar output

(pid=391663) True 0 <torch.cuda.device object at 0x7f4e2e2893c8> 1  NVIDIA RTX 3090

HI @daniel looking at the issues in github it looks like it has been broken for some weeks now. Several people seem to be waiting on a fix.

@daniel,

This would be a good issue to read and track: