Serving reinforcement learning policy models via kuberay

Hello everyone,

I’m currently working on deploying an RL policy model on Kuberay using the examples provided in the Ray documentation for serving reinforcement learning policy models (link) and the Kuberay guidance documentation (link).

My objective is to create a deployment graph similar to the fruit stand example, but with RL agents trained on Ray Air. I have successfully built serve deployment classes using RL predictors and retrieving checkpoints from a URI to request predictions over localhost. However, when attempting to deploy this model on Kuberay, I encounter issues where the serve deployment API fails, restarts, attempts to delegate to another worker, and then fails again. I have attempted to allocate more resources by adding additional nodes, but this has not resolved the problem.

I would appreciate it if anyone who has experience in creating deployment graphs to serve RL models on Kuberay could share their insights and potentially help me troubleshoot the issue I’m facing.

Here is a generalized version of the code I have been working on:

from starlette.requests import Request
import requests
import numpy as np
import ray
from ray import serve
from ray.air.checkpoint import Checkpoint
from ray.serve.air_integrations import PredictorWrapper
from ray.train.rl.rl_predictor import RLPredictor

# Specify the checkpoint path
checkpoint_path = Checkpoint.from_uri("gs://...")

@serve.deployment
class ServeModel:
    def __init__(self, checkpoint_path) -> None:
        self.algorithm = PredictorWrapper(RLPredictor, checkpoint_path)

    async def __call__(self, request: Request):
        json_input = await request.json()
        obs = json_input["observation"]

        action = await self.algorithm.predict(np.array(obs))
        return {"action": int(action)}

# Bind the model to the checkpoint path
model = ServeModel.bind(checkpoint_path)

In this code, I’m utilizing the starlette library for handling incoming requests. The ServeModel class is decorated as a serve.deployment to indicate that it should be deployed on Kuberay. The checkpoint_path is passed to the class during initialization, and the RLPredictor is wrapped using PredictorWrapper for integration with the Ray Serve infrastructure.

The __call__ method handles the incoming requests, extracts the observation from the JSON payload, and uses the RL predictor to make predictions. The predicted action is then returned in the response.

Please note that you would need to replace "gs://..." in the checkpoint_path with the actual URI of your checkpoint location. This is only provided to show that I am utilizing google cloud storage to save my trained models.

In addition to that, I am utilizing a kuberay example manifest that is supposed to work similiarly to the fruit stand example:

# Make sure to increase resource requests and limits before using this example in production.
# For examples with more realistic resource configuration, see
# ray-cluster.complete.large.yaml and
# ray-cluster.autoscaler.large.yaml.
apiVersion: ray.io/v1alpha1
kind: RayService
metadata:
  name: rayservice-sample
spec:
  serviceUnhealthySecondThreshold: 300 # Config for the health check threshold for service. Default value is 60.
  deploymentUnhealthySecondThreshold: 300 # Config for the health check threshold for deployments. Default value is 60.
  serveConfig:
    importPath: rl_composition:model
    deployments:
      - name: ServeModel
        numReplicas: 1
        routePrefix: "/"
        rayActorOptions:
          numCpus: 0.1
  rayClusterConfig:
    rayVersion: '2.4.0' # should match the Ray version in the image of the containers
    ######################headGroupSpecs#################################
    # Ray head pod template.
    headGroupSpec:
      # The `rayStartParams` are used to configure the `ray start` command.
      # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
      # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
      rayStartParams:
        dashboard-host: '0.0.0.0'
      #pod template
      template:
        spec:
          serviceAccountName: SA_NAME_HERE
          containers:
            - name: ray-head
              image: YOUR_DOCKER_IMAGE_NAME_HERE # Replace with your Docker image name
              resources:
                limits:
                  cpu: 2
                  memory: 2Gi
                requests:
                  cpu: 2
                  memory: 2Gi
              ports:
                - containerPort: 6379
                  name: gcs-server
                - containerPort: 8265 # Ray dashboard
                  name: dashboard
                - containerPort: 10001
                  name: client
                - containerPort: 8000
                  name: serve
    workerGroupSpecs:
      # the pod replicas in this group typed worker
      - replicas: 1
        minReplicas: 1
        maxReplicas: 5
        # logical group name, for this called small-group, also can be functional
        groupName: small-group
        # The `rayStartParams` are used to configure the `ray start` command.
        # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
        # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
        rayStartParams: {}
        #pod template
        template:
          spec:
            serviceAccountName: SA_NAME_HERE
            containers:
              - name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc'
                image: YOUR_DOCKER_IMAGE_NAME_HERE # Replace with your Docker image name
                lifecycle:
                  preStop:
                    exec:
                      command: ["/bin/sh","-c","ray stop"]
                resources:
                  limits:
                    cpu: "1"
                    memory: "2Gi"
                  requests:
                    cpu: "500m"
                    memory: "2Gi"

Thank you in advance for your assistance!

@rrmartin do you have any relevant error log output for the failing deployments? Does the kuberay cluster generally work (can you run other workloads on it)?

cc @Akshay_Malik @cindy_zhang for serve