How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
What happens
- I start up a Ray cluster that is enabled with a GPU on each node.
- I follow the Ray documentation and install the NVIDIA Container Toolkit, adding the required commands to
initialization_commands
in myconfig.yaml
. - I submit my FastAPI script with the Ray CLI, but it says that there is no GPU available.
config.yaml
-----------
cluster_name: minimal
max_workers: 4
upscaling_speed: 1.0
docker:
image: "rayproject/ray-ml:latest-py38-gpu"
container_name: "ray_container"
pull_before_run: True
run_options: # Extra options to pass into "docker run"
- --ulimit nofile=65536:65536
idle_timeout_minutes: 5
provider:
type: gcp
region: europe-west1
availability_zone: europe-west1-b
project_id: bert-training-test
auth:
ssh_user: ubuntu
available_node_types:
ray_head_default:
resources: {"GPU": 1, "CPU": 8}
node_config:
machineType: n1-highmem-8
guestAccelerators: [
{
"acceleratorType": "nvidia-tesla-t4",
"acceleratorCount": 1
}
]
scheduling:
onHostMaintenance: TERMINATE
disks:
- boot: true
autoDelete: true
type: PERSISTENT
initializeParams:
diskSizeGb: 100
sourceImage: projects/ml-images/global/images/c2-deeplearning-pytorch-1-12-cu113-v20220701-debian-10
ray_worker_small:
min_workers: 0
max_workers: 2
resources: {"GPU": 1, "CPU": 8}
node_config:
machineType: n1-highmem-8
guestAccelerators: [
{
"acceleratorType": "nvidia-tesla-t4",
"acceleratorCount": 1
}
]
scheduling:
onHostMaintenance: TERMINATE
disks:
- boot: true
autoDelete: true
type: PERSISTENT
initializeParams:
diskSizeGb: 100
sourceImage: projects/ml-images/global/images/c2-deeplearning-pytorch-1-12-cu113-v20220701-debian-10
scheduling:
- preemptible: true
head_node_type: ray_head_default
file_mounts: {
"/entity-level-risk": "/Users/ljbails/Repositories/entity-level-risk"
}
cluster_synced_files: []
file_mounts_sync_continuously: False
rsync_exclude:
- "**/.git"
- "**/.git/**"
rsync_filter:
- ".gitignore"
initialization_commands: [
"sudo /opt/deeplearning/install-driver.sh",
"distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list",
'grep -l "nvidia.github.io" /etc/apt/sources.list.d/* | grep -vE "/nvidia-container-toolkit.list\$" | xargs sudo rm -rf',
"sudo apt-get update",
"sudo apt-get install -y nvidia-docker2",
"sudo systemctl restart docker"
]
setup_commands: [
"export CPPFLAGS='-std=c++98'",
"cd /entity-level-risk && python -m pip install -e '.[deploy]'",
]
head_setup_commands:
- pip install google-api-python-client==1.7.8
worker_setup_commands: []
head_start_ray_commands:
- ray stop
- >-
ray start
--head
--port=6379
--object-manager-port=8076
--autoscaling-config=~/ray_bootstrap_config.yaml
worker_start_ray_commands:
- ray stop
- >-
ray start
--address=$RAY_HEAD_IP:6379
--object-manager-port=8076
my-app.py
-----------
import ray
from ray import serve
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from transformers import AutoModelForTokenClassification, AutoTokenizer
from pydantic import BaseModel
import torch
from torch.nn import functional as F
import pandas as pd
import numpy as np
print("######## DEVICE ########")
print("cuda:0" if torch.cuda.is_available() else "cpu")
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
ray.init(address="auto", namespace="serve")
serve.start(detached=True, http_options={"host": "0.0.0.0"})
@serve.deployment(route_prefix="/nrer",
num_replicas=2,
ray_actor_options={"num_gpus": 1, "num_cpus": 6})
@serve.ingress(app)
class NRERDeployment:
def __init__(self):
...
NRERDeployment.deploy()
bash
-------
>> ray up config.yaml
>> ray submit config.yaml my-app.py
2022-08-09 10:36:17,569 INFO util.py:335 -- setting max workers for head node type to 0
Loaded cached provider configuration
If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
/Users/ljbails/.pyenv/versions/3.9.11/envs/elr/lib/python3.9/site-packages/google/auth/_default.py:81: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
/Users/ljbails/.pyenv/versions/3.9.11/envs/elr/lib/python3.9/site-packages/google/auth/_default.py:81: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
/Users/ljbails/.pyenv/versions/3.9.11/envs/elr/lib/python3.9/site-packages/google/auth/_default.py:81: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see https://cloud.google.com/docs/authentication/
warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING)
Fetched IP: 104.155.93.233
Shared connection to 104.155.93.233 closed.
Shared connection to 104.155.93.233 closed.
2022-08-09 10:36:26,079 INFO util.py:335 -- setting max workers for head node type to 0
Fetched IP: 104.155.93.233
Shared connection to 104.155.93.233 closed.
######## DEVICE ########
cuda:0
(ServeController pid=775) INFO 2022-08-09 02:36:36,171 controller 775 checkpoint_path.py:17 - Using RayInternalKVStore for controller checkpoint and recovery.
(ServeController pid=775) INFO 2022-08-09 02:36:36,274 controller 775 http_state.py:112 - Starting HTTP proxy with name 'SERVE_CONTROLLER_ACTOR:SERVE_PROXY_ACTOR-node:10.132.0.18-0' on node 'node:10.132.0.18-0' listening on '0.0.0.0:8000'
(HTTPProxyActor pid=807) INFO: Started server process [807]
(ServeController pid=775) INFO 2022-08-09 02:36:40,433 controller 775 deployment_state.py:1216 - Adding 2 replicas to deployment 'NRERDeployment'.
(scheduler +13s) Tip: use `ray status` to view detailed cluster status. To disable these messages, set RAY_SCHEDULER_EVENTS=0.
(scheduler +13s) Adding 1 nodes of type ray_worker_small.
(ServeController pid=775) WARNING 2022-08-09 02:37:10,467 controller 775 deployment_state.py:1453 - Deployment 'NRERDeployment' has 1 replicas that have taken more than 30s to be scheduled. This may be caused by waiting for the cluster to auto-scale, or waiting for a runtime environment to install. Resources required for each replica: {'CPU': 6, 'GPU': 1}, resources available: {'CPU': 2.0}.
(ServeController pid=775) WARNING 2022-08-09 02:37:10,467 controller 775 deployment_state.py:1453 - Deployment 'NRERDeployment' has 1 replicas that have taken more than 30s to be scheduled. This may be caused by waiting for the cluster to auto-scale, or waiting for a runtime environment to install. Resources required for each replica: {'CPU': 6, 'GPU': 1}, resources available: {'CPU': 2.0}.
(ServeController pid=775) WARNING 2022-08-09 02:37:10,467 controller 775 deployment_state.py:1453 - Deployment 'NRERDeployment' has 1 replicas that have taken more than 30s to be scheduled. This may be caused by waiting for the cluster to auto-scale, or waiting for a runtime environment to install. Resources required for each replica: {'CPU': 6, 'GPU': 1}, resources available: {'CPU': 2.0}.
...
Note that torch.cuda.is_available()
returned True, but it says resources available: {'CPU': 2.0}
When I change num_replicas
to 1, rather than 2, it works fine.
Any idea what I’m doing wrong?