KubeRay Won't Scale to Zero: datasets_stats_actor Persists

1. Severity of the issue:
Medium: Significantly affects my productivity but can find a workaround.

2. Environment:

  • Ray version: 2.48.0
  • Python version: 3.12.9
  • OS: ubuntu (from standard kuberay worker image rayproject/2.48.0-py312-aarch64)
  • Cloud/Infrastructure: kubernates cluster

3. What happened vs. what you expected:
I’m running Ray Data batch jobs on a KubeRay cluster, and I’m having trouble getting the cluster to scale down to zero workers after a job completes.

  • Expected: after the Ray Data job finishes and the autoscaler’s idle timeout period (idleTimeoutSeconds) passes, all Ray worker pods should be terminated, scaling the cluster down to zero active workers

  • Actual: after the job finishes, one worker pod consistently remains running indefinitely and is never terminated. Investigation shows this is because the internal datasets_stats_actor remains alive on that node. The presence of this actor prevents the KubeRay autoscaler from considering the node idle, which blocks the final scale-down action

I have tried to disable this actor like this, but it doesn’t work

from ray.data import DataContext

ctx = DataContext.get_current()
ctx.enable_auto_log_stats = False

My Questions:

  1. What is the recommended, idempotent way to ensure a Ray Data job on KubeRay cleans up all its resources, including the datasets_stats_actor, to allow a scale-down to zero?
  2. Is there the way to ask datasets_stats_actor to be created on head node, instead of worker?