As an rllib end user you can think of ray as asynchronous distributed job scheduler. Think of it like this; ray is running a number of schedulers and code executors on on machine or a whole cluster of machines. You as an end user have some code that you want ray to run for you. Ray’s job is to figure out where it should run based on a set of resource requirements that you provide (through the config). Here is a key part: once ray starts a job it has very little to no control over how many resources that job is going to consume. You could ask it to run a task that sleeps the whole time and so it would use virtually no resources (cpu or gpu) at all. On the other hand you could write a piece of code that determines the total number of physical resources and uses them all for almost the whole time.
So if ray has little control over what is actually running once it starts then what is the point of all the configuration parameters specifying cpus and gpus? This is where its job as a scheduler comes into play. When you start ray either you tell it the resources available to it (number of cpus, gpus, memory, special licenses, …) or if you don’t it uses some pre-written rules to determine it automatically. Now ray is up and running and knows how many resources it is managing.
When you ask ray to train an RL algorithm with rllib you are scheduling jobs for ray to run. In order to make sure that it does not allocate more jobs than can be handled at one time it needs to know the resource requirements of the job you want to run. Once it knows that it can determine whether it can start the job now or if it has to wait until already running tasks complete so that it has the required resources.
Lets say you started ray with 4 cps and you have a running task that told ray it will use 2 of them. If you try and start a second job that needs 3 then ray will schedule your job but it cannot run because you need 3 but only 2 are currently free. It will wait until the first one stops so that it can allocate 1 of those cpus. When you tell ray that an rllib job needs 0 cpus then you are saying that it can run as many jobs as it wants. You could start 100 rllib trainings at the same time but it will surely fail by running out of memory at some point and probably run extremely slow before that happens. That will happen because you did not giving ray a realistic set of requirements for the jobs you actually ran and it oversubscribed the system.
When you tell ray the driver requires 0 cpus you are NOT imposing a constraint on the driver. The driver is going to use however much of the cpu it needs to to execute the instructions required by its implementation. You are merely telling ray how much the driver will use so that ray knows to set that number aside when asked to run other things.
When you inference but more importantly train a neural network it may require a lot of compute intensive resources. In ray and rllib the neural network part of the system is the part that will benefit from the specialized capabilities of the gpu. There are a lot of parts of rllib that will not. Running the environment; collecting, storing and retrieving samples; managing the: sample, store, train, update pipeline; communication over the network with other machines, the redis backend, etc…; logging data to files. Those kinds of tasks don’t run on the gpu and never will.
Given your goal what you really can hope to do is try and set up a configuration where the amount of cpu used to do things other than neural network computations is comparable between your dependent variables which are the amount of time, iterations, samples, … required when using the neural network on the cpu or on the gpu.
Hope this helps,