Does Ray2.7.0 support Ascend NPU?

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I’m using this demo(ray-service.text-summarizer.yaml) to test

I had edited yaml’s workerGroupSpecs section,like this
workerGroupSpecs:
# The pod replicas in this group typed worker
- replicas: 1
minReplicas: 1
maxReplicas: 10
groupName: gpu-group
rayStartParams:
resources: ‘{“NPU”: 1}’
# Pod template
template:
spec:
nodeName: npu-1
containers:
- name: ray-worker
image: registry.paas/cmss/rayproject/ray-ml:2.7.0
volumeMounts:
- mountPath: /tmp/ray
name: ray-logs
- mountPath: /mnt
name: zip
resources:
limits:
cpu: 4
memory: “16G”
Huawei - Building a Fully Connected, Intelligent World 1
requests:
cpu: 3
memory: “12G”
Huawei - Building a Fully Connected, Intelligent World 1

when i use kubectl apply this file, I found the worker pod’s Status is CrashLoopBackOff. I got this error:

kubectl --namespace ray-system logs pod/text-summarizer-raycluster-mzs2d-worker-gpu-group-kk2sr
Defaulted container “ray-worker” out of: ray-worker, wait-gcs-ready (init)
Usage: ray start [OPTIONS]
Try ‘ray start --help’ for help.

Error: Got unexpected extra argument (1})

resources: "'{\"NPU\": 1}'"

Hi @pengxiang_chen Latest Ray supports Ascend NPU.

When use ray train, ray train use transformer+accelerate to set device, accelerate get os.environ[“ACCELERATE_TORCH_DEVICE”] setted by ray/train/torch/config.py——in fact, ray/air/_internal/torch_utils.py, but there only get CUDA devices…

Hi @pengxiang_chen, Ray Train “currently” only supports CPU and GPU. We are working hard to make Ray Train support Huawei Ascend NPU and more third-party devices, the relevant PR is: [Train] Decouple device-related modules and add Huawei NPU support to Ray Train #44086.

Please feel free to follow this PR and provide any of your requirements or suggestions. Once this PR is merged and released, ACCELERATE_TORCH_DEVICE will be able to access NPU, and we will also test running transformer+accelerate on NPUs.

We appreciate your interest and look forward to your feedback.

PR looks active here (TY @liuxsh9 for contribution as always) cc @yunxuanx for the context on the community desire for it.

The PR on Ray Train side has been merged! [Train] Decouple device-related modules and add Huawei NPU support to Ray Train by liuxsh9 · Pull Request #44086 · ray-project/ray · GitHub

1 Like