Local computer connect to Azure Cluster

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi, I deployed Azure Ray Cluster by using template from ray/doc/azure/azure-ray-template.json at master · ray-project/ray (github.com).
The deployment is work well; I can check the dashboard through port 8265.

I have a code training CartPoleV1 from my local computer. How can I connect to Ray Cluster to solve this training?

import gymnasium as gym
import ray
from ray.rllib.algorithms.ppo import PPOConfig
import time

print("Ray version:", ray.__version__)
ray.init(address="ray://my-azure-head-ip-address:8265")
print("Ray is initialized:", ray.is_initialized())

config = (
    PPOConfig()
    .environment("CartPole-v1")
    .framework("torch"))

algorithm = config.build()
start = time.time()
iteration = 10
print("Training for", iteration, "iteration(s).")
for i in range(iteration):
    result = algorithm.train()
    print(f"Mean reward at {i}: {result['episode_reward_mean']}")

print("Training time:", time.time() - start, "seconds.")

In addition, I checked the file named ray-head.sh in the Head-Virtual-Machine, the cmd is
ray start --head --port=6379 --object-manager-port=8076 --num-gpus=$NUM_GPUS --bblock --dashboard-host 0.0.0.0
then I tried to init ray with
ray.init(address="ray://20.230.177.46:6379")
but I still cannot connect to Cluster.

Thank you!

You can try using ray.init(address="ray://20.230.177.46:10001") to connect remotely using Ray Client. Port 6379 is used for internal communication between Ray nodes.

However, it’s recommended to use the Ray Job API to run your workloads on the cluster instead of Ray Client.