1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
2. Environment:
- Ray version: 2.50.0
- Python version: 3.10
- OS: Linux
- Cloud/Infrastructure:
- Other libs/tools (if relevant): accelerate
3. What happened vs. what you expected:
- Expected: Accelerator from accelerate uses all gpus provided to worker loop function, when they’re visible with CUDA_VISIBLE_DEVICES provided by actor with resources_per_worker.
- Actual: Accelerator uses single gpu even if several gpus are visible in worker loop function.
Let’s say I have 4 GPUs, I specified in resources_per_worker that it can use 2 GPUS, when I start my HPO ray shows that it uses 2 GPUs for each actor (but in reality uses only 1 gpu per actor). I checked that CUDA_VISIBLE_DEVICES inside worker_loop_function has expected GPUs and it is, but right after I create Accelerator it has attribute num_processes=1 and correspondingly uses 1 single GPU, I tried to use accelerate launch instead, but it just doesn’t start any worker loop function.
I have code that looks something like this:
import os
from typing import Any
import ray
from accelerate import Accelerator
from ray.train import RunConfig
from ray.train.torch import TorchTrainer
from ray.tune import TuneConfig, Tuner, choice
from ray.tune.search.optuna import OptunaSearch
def hpo_loop(config: dict[str, Any]) -> list[tuple[float, str]]:
# gives 0,1,2
print(os.getenv("CUDA_VISIBLE_DEVICES"))
accelerator = Accelerator(mixed_precision="bf16", cpu=False)
# gives 1
print(accelerator.num_processes)
# gives MULTI_GPU, so I don't think this is the reason
print(accelerator.distributed_type)
if __name__ == "__main__":
resources_per_worker = {"CPU": 4, "GPU": 2}
trainer = TorchTrainer(
train_loop_per_worker=hpo_loop,
scaling_config=ray.train.ScalingConfig(
resources_per_worker=resources_per_worker, use_gpu=True
),
)
search_alg = OptunaSearch(seed=42)
tuner = Tuner(
trainable=trainer,
tune_config=TuneConfig(
metric="loss",
mode="min",
search_alg=search_alg,
num_samples=4,
),
run_config=RunConfig(stop={"training_iteration": 1}),
param_space={"train_loop_config": {"lr": choice([0.01, 0.001])}},
)
tuner.fit()
I run it with python3, something like python3 hpo.py.
I use hydra for configuration, but I don’t think it’s a problem, all config params have been parsed correctly in each worker).
Can you explain what am I doing wrong and how to fix it?