GPU Actors always pending with Ray Serve and Ray v2.0.0

We have been using Ray v1.13.0 and the Ray Serve component for our application
and are attempting to migrate to Ray v2. In short, Ray v2 is not working for us.
When we try to start a new actor that uses a GPU, it is stuck in the
PENDING_CREATION state. But, with Ray v1.13.0 the exact same code will spin up
a GPU actor and transition to the Alive state.

A greatly simplified (hopefully working and reproducible) version of the Ray v1.13 code:

#!/usr/bin/env python3

# Deploy script Ray v1

import logging
import os
import ray

ray.init(address="auto", namespace="serve")

api.Api.deploy(context)
#!/usr/bin/env python3

# Driver script, which is executed as `python3 -m driver` from within the docker container for the `CMD`.

import os

os.system("python3 -m deploy")

while True:
    time.sleep(5)
#!/usr/bin/env python3

# FastAPI app and Serve deployment class

from fastapi import FastAPI

app = FastApi()

@serve.deployment
@serve.ingress(app)
class Api:
    def __init__(self):
        self._actors: List[Dict] = {}
        self._next_id: int = 0
  
    @property
    def next_id(self) -> int:
        self._next_id = self._next_id + 1
        return self._next_id
   
    def initialized(self, id: int):
        self._actors.push(id)

    @app.post("/actor")
    def create_actor(self):
        deployment = ray.serve.get_deployment(self.__class__.__name__)
        this_actor = deployment.get_handle()
        id = self.next_id
        handle = GpuActor.remote(id, this_actor)
        self._actors.push({"id": id, "handle": handle})
        return id
#!/usr/bin/env python3

# GPU Actor

from ray.actor import ActorHandle

@ray.remote
class GpuActor:
    def __init__(self, id: int, manager: ActorHandle):
        self._id = id
        self._manager = manager
        self._manager.initialized.remote(self._id)
       
    def id(self) -> int:
        return self._id
    
    def run(self):
        pass

We are using the deployment class as an actor manager because we have other
endpoints and functionality to stop/spin down the GPU actors and we wanted to
minimize the number of “support” actors using up CPU resources, i.e. cores, in our resource constrained targets.

With Ray v1.13.0, the initialized remote function of the Api deployment class
will be executed by GpuActor. The Ray Dashboard will also show the
GPU Actor is alive and using a GPU. When we bump to Ray v2.0.0, the initialized
remote function will never be executed, and the Ray Dashboard will show a
PENDING_CREATION for the GPU Actor.

There is a little more to the configuration. The above Python code is run inside
a docker container with the following simplified entrypoint script.

# Entrypoint script for CPU-only head node docker container

## Display Version
ray --version

ray start --head --num-gpus=0

serve start --http-port=7777

exec "$@"

The head container/node does not have any GPUs. A separate GPU node is attached
to this Ray cluster with the following entrypoint script.

# Entrypoint script for GPU worker node docker container

# Display Version
ray --version

ray start --address=<HEAD NODE IP ADDRESS>

# python3 -c 'import ray; ray.init(address="auto"); print("Node initialized: {}".format(ray.is_initialized()))'

# Sleep
sleep infinity

With Ray v1.13, we are able to start the head node and “API” Python code. We
initially see no GPUs available. Then, we start up the GPU worker node and we
can see it added to the cluster in the Ray Dashboard. In one such environment,
the GPU worker node has two GPUs and we see both GPUs for the node in the Ray
Dashboard. With Ray v2.0.0, we see an identical configuration and display in the
Ray Dashboard.

We recognized that Ray v2.0.0 and the Ray Serve component have a new API and
deprecated much of the Ray Serve API and CLI that we are using in Ray v1.13. So,
we tried to migrate to Ray v2 following the migration guide. However, we are not
using the default HTTP port of 8000, but 7777, and we had to implement a
workaround based on this comment and issue. So, the Ray v2 deployment script looks like:

#!/usr/bin/env python3

# Deploy script Ray v2

import logging
import os
import ray

ray.init(address="auto", namespace="serve")

ray.serve.shutdown()

deployment = api.Api.options(route_prefix="/api").bind()

ray.serve.run(deployment, port=7777)

We removed serve start --http-port=7777 from the entrypoint script and we
removed route_prefix from the serve.deployment decorator for the Api
class. We left the deprecated get_deployment and get_handle usage because we
could not figure out how to replicate this functionality with the Ray v2 “bind”
API.

Despite migrating as best we could to the Ray v2 API, the initialized remote
method is never executed and a PENDING_CREATION actor is observed in the Ray
Dashboard.

Because of the issue with the HTTP port configuration, we could not use the
procedure recommended in the Ray v2 documentation for deployments with the CLI,
i.e., serve run deploy:api. We are not sure if the workaround for the HTTP
port is conflicting with our deployment implementation that is blocking us from
spinning up GPU Actors. CPU Actors appear to work as expected, but these are
spinning up on the head node. Again, our implementation and architecture works
great and is very stable with Ray v1.13.0.

Any help and/or information would be greatly appreciated. At the moment, this is blocking us from moving to Ray v2.