RLlib, PyTorch and Mac M1 GPUs: No available node types can fulfill resource request

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello Ray community!

A year ago I began experimenting w/ QMIX on rrlib to control the MATSim traffic simulator. Since then, I have purchased a 2021 MacBook 14" which has a 10-core M1 CPU and 10 GPUs. Prototyping my multiagent scenario could greatly benefit from the speedup from the GPU cores. However, I can’t seem to get Ray to recognize the GPUs are available. I recognize that, while m1 support currently exists for both Ray and PyTorch, it is experimental.

Below, the first section shows my env setup, and the second section shows the hello-world-flavored tests I ran to confirm PyTorch, RLlib, and finally, GPU utilization.

My environment

based on reviewing these installation instructions:

# miniforge
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
zsh Miniforge3-MacOSX-arm64.sh
rm Miniforge3-MacOSX-arm64.sh

# pytorch
conda install pytorch -c pytorch-nightly

# ray
pip uninstall grpcio
conda install grpcio=1.43.0
pip install ray "ray[rllib]"
1. Confirm PyTorch sees GPUs (OK)
$ python
Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:00:33) 
[Clang 13.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.backends.mps.is_available()
True
2. Run CartPole with RLlib on PyTorch using CPUs (OK)

Next I confirm I can run the cartpole example with torch (“–framework torch”) and otherwise default arguments. This terminates normally after 26 seconds with a reward of 156.79 after 11 iterations/44k time steps:

== Status ==
Current time: 2022-07-08 10:03:22 (running for 00:00:30.28)
Memory usage on this node: 10.8/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/10 CPUs, 0/0 GPUs, 0.0/5.99 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/rjf/ray_results/PPO
Number of trials: 1/1 (1 TERMINATED)
3. Run CartPole with RLlib on PyTorch using GPUs (FAILED)

command as launched from VS Code, where my launch.json has the added env entry for "RLLIB_NUM_GPUS": "1":

$ cd /Users/rjf/dev/external/ray ; /usr/bin/env /Users/rjf/miniforge3/bin/python /Users/rjf/.vscode/extensions/ms-python.python-2022.8.1/pythonFiles/lib/python/d
ebugpy/launcher 54469 -- /Users/rjf/dev/external/ray/rllib/examples/cartpole_lstm.py --framework torch 
(scheduler +8s) Tip: use `ray status` to view detailed cluster status. To disable these messages, set RAY_SCHEDULER_EVENTS=0.
(scheduler +8s) Error: No available node types can fulfill resource request {'GPU': 1.0, 'CPU': 1.0}. Add suitable node types to this cluster to resolve this issue.
...

Ray status notifications in the console repeatedly say “PENDING” after that:

== Status ==
Current time: 2022-07-08 10:03:54 (running for 00:00:05.15)
Memory usage on this node: 10.6/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/10 CPUs, 0/0 GPUs, 0.0/6.89 GiB heap, 0.0/2.0 GiB objects
Result logdir: /Users/rjf/ray_results/PPO
Number of trials: 1/1 (1 PENDING)

Thanks in advance for any help.

Hi @robfitzgerald,

hard to say having not the whole config at hand. My guess is: You run with "num_workers" > 0 and each worker is requesting {"GPU": 1, "CPU": 1}. This cannot be fullfilled and therefore the status keeps in “PENDING”.

If this is the case, try to set the num_workers=3 and num_gpus_per_worker=0.25 as an example

Lars,

Thank you for the lead.

hard to say having not the whole config at hand

I am running the cartpole example, passing --framework torch as an argument. the config section begins here.

Per your suggestion I have modified the general config section at line 63 to include your suggestion (and tested with and without num_gpus, to test if there was some kind of collision between these config keys):

            "env": StatelessCartPole,
            # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
            # "num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")),
            "num_workers": 3,
            "num_gpus_per_worker": 0.25,
            "model": {
                "use_lstm": True,
                "lstm_cell_size": 256,
                "lstm_use_prev_action": args.use_prev_action,
                "lstm_use_prev_reward": args.use_prev_reward,
            },
            "framework": args.framework,
            # Run with tracing enabled for tfe/tf2?
            "eager_tracing": args.eager_tracing,

with similar results

without num_gpus key:

(scheduler +8s) Error: No available node types can fulfill resource request {'GPU': 0.25, 'CPU': 1.0}. Add suitable node types to this cluster to resolve this issue.

with num_gpus=1:

(scheduler +8s) Error: No available node types can fulfill resource request {'GPU': 1.0, 'CPU': 1.0}. Add suitable node types to this cluster to resolve this issue.
(scheduler +8s) Error: No available node types can fulfill resource request {'GPU': 0.25, 'CPU': 1.0}. Add suitable node types to this cluster to resolve this issue.

It’s not obvious to me where to look in the documentation for a solution, besides the list of common parameters, which didn’t clear this up for me. Any other ideas?

@robfitzgerald it appears to me as if Ray might have not recognized the GPU internally.

Hi both, I was just looking into this as well and came across this discussion. @robfitzgerald I am pretty sure the Ray Apple Silicon instructions you linked to are only about running Ray on Apple Silicon CPUs at all, not about GPU support.
The issue is that mps backend GPUs apparently need to be explicitly addressed in torch, they are separate from CUDA devices. As far as I can tell, this doesn’t happen in ray/rllib (yet). And then separately from that, for Ray Tune etc. it would need to recognise the mps devices as GPUs too. So bottom line, I don’t think mps GPUs are supported right now.

However, it seems that it works with tensorflow, using tensorflow-metal. Just do pip install tensorflow-metal, set framework to tf2, (not tf!), and it should see the GPU. Tested on my Intel Macbook with AMD GPU, but I imagine it should work on M1 too.

@mgerstgrasser Yes, these installation instructions are what has gotten us to run Ray on our MacBooks.
@Lars_Simon_Zehnder That’s what it looks like to me, too.

@robfitzgerald RLlib does not care about where GPUs sit, what kind of GPU they are and is also not involved in recognizing them. It is just allocated a GPU resource and works from there. So the issue you are describing is much more likely to be an issue with Ray Core not recognizing the resource to begin with.

Can you reproduce without RLlib? That is, create a dummy actor that would require GPU(s) and try to create it?
If that reproduces the issue, could you change this post to be in the Ray Core sub-forum? Afaics most of our teams work on ordinary MacBooks or on clusters so we usually don’t deal with your special hardware setup.

Cheers

@robfitzgerald @mgerstgrasser

it might be that, if you want to run this on M1, you also neet to install tensorflow-macos.

Best

@arturn I think there’s actually two issues: One is that mps devices in Torch need to be adressed separately from CUDA devices - that one is an rllib issue, and I’ve opened a feature request issue for it: [RLlib] Support for mps (Apple Metal) GPUs in torch · Issue #28321 · ray-project/ray · GitHub

The second issue is Ray recognising metal devices as GPUs, that one presumably is a ray core issue, but I don’t know nearly as much about it. Should ray.get_gpu_ids() show all available GPUs on my system, if I call it after ray.init() I assume? And I assume internally this just looks for CUDA devices, correct? I just noticed that this returns an empty list on my machine, even though I have a metal GPU available.

I’m still able to use a GPU using tf2, but I think it’s because of something odd I am doing for unrelated reasons: I am wrapping my rllib trainer in a function and pass that function to ray tune as a trainable, so I think ray tune doesn’t actually see how many resources the trainer requests. And hence it will happily start the trial even if ray itself doesn’t detect the Metal GPU. So I think as a workaround for now this could work for others too, and I suspect that for ray tune you could maybe even just set the resources_per_trial option in tune, without having to wrap your trainer inside a function.

So to sum up for clarity: For tf2, metal GPUs are supported by the framework without modification, and so rllib already supports them. For torch, metal GPUs would need changes in rllib. For both tf2 and torch, it seems like Ray doesn’t detect metal GPUs, but you can get around this by just not requesting GPUs for your trainable.

@mgerstgrasser Your assumption is correct, ray.get_gpu_ids() should show all GPUs if you don’t put constraints in ray.init(). Behind the scenes, Ray will mask out GPUs with CUDA_VISIBLE_DEVICES.
@robfitzgerald If you do the work yourself, you can still introduce custom resources though. Since Ray and RLlib don’t care about resources they don’t recognize, it is up to your tensor processing backend to recognize them.

@arturn Got it. But it also seems that this is easy to get around, but just not having your trial request a GPU resource (it can still of course use a Metal device internally if one is available) - does that sound right to you? Since I doubt anyone will ever run Ray with Metal devices other than for local testing on a single machine, resource allocation is perhaps not super important, and so probably this is fine as a workaround.

Yep, working around it by ignoring such resource contraints should be fine. It will get trickier when you want RolloutWorkers to use dedicated resources to sample from the environment though. This is where custom resources would have to disentangle the resource usage!

Any update on this? Can we run Ray on apple GpU now to do embeddings? If I put device = gpu will it detect apple silicon gpu? Thanks in Advance!