How to run the script distributedly?

I setup ray cluster by the manual way.

$ ray start --head --port=6379
...
 To connect to this Ray runtime from another node, run
  ray start --address='<ip address>:6379' --redis-password='<password>'

Then,I ran my script at the head node, and the following error occurred.

2021-04-23 20:47:33,509	WARNING worker.py:1107 -- Failed to unpickle the remote function 'sampler_multi.one_episode' with function ID fd6574e8a41423133de16b0bbc6a11c71911311819a020d0ab918b91. Traceback:
Traceback (most recent call last):
  File "/home/temp_user/.conda/envs/cluster/lib/python3.7/site-packages/ray/function_manager.py", line 180, in fetch_and_register_remote_function
    function = pickle.loads(serialized_function)
ModuleNotFoundError: No module named 'sampler_multi'

sampler_multi is one of my script files. I guess the error is caused by no script on the slave node, but I don’t know how to distribute the script to all slave node.
I’m looking forward to your answer. Thanks.

Right now, if you’re setting up a cluster manually, you’d also have to sync code files manually.
There’s ongoing work that will soon allow Ray to handle file syncing internally.

In the meantime, another alternative is to use the Ray autoscaler which has a file_mounts setting for this purpose.

I am trying to config .yaml to sync code files, but in fact, .yaml only starts the head node, not the worker node in my private cluster.

Is this a problem with my configuration?
And another question is how to sync code files manually, I’m not clear where to put my code in the worker node.

Hope for your reply

there is my yaml:

cluster_name: temp_user
min_workers: 1
initial_workers: 1
max_workers: 1
upscaling_speed: 1.0
idle_timeout_minutes: 5
docker: {}
provider:
    type: local
    head_ip: 172.31.233.205
    worker_ips: [172.31.233.204,]
auth:
    ssh_user: temp_user
available_node_types:
    ray.head.default:
        resources: {}
        min_workers: 0
        max_workers: 0
        node_config: {}
    ray.worker.default:
        resources: {}
        min_workers: 0
        node_config: {}
head_node_type: ray.head.default
file_mounts: {
     "/home/temp_user/remote/repos/ray_elegantrl": "/home/zgy/repos/ray_elegantrl",
}
cluster_synced_files: []
file_mounts_sync_continuously: True
rsync_exclude: []
rsync_filter: []
initialization_commands:
    - >-
      conda activate cluster;
      pip install -U ray==1.2.0
setup_commands:
    - >-
      conda activate cluster;
      pip install -U ray==1.2.0
head_setup_commands:
    - >-
      conda activate cluster;
      pip install -U ray==1.2.0
worker_setup_commands:
    - >-
      conda activate cluster;
      pip install -U ray==1.2.0
head_start_ray_commands:
    - >-
      conda activate cluster;
      ray stop
    - >-
      conda activate cluster;
      ulimit -c unlimited;
      ray start --head --port=9998 --autoscaling-config=~/ray_bootstrap_config.yaml
worker_start_ray_commands:
    - >-
      conda activate cluster;
      ray stop
    - >-
      conda activate cluster;
      ray start --address=$RAY_HEAD_IP:9998
head_node: {}
worker_nodes: {}

Actually, autoscaler support for on-prem local clusters is currently broken…
Progress tracked here:

Hi, I used your YAML file. But Ray seems only sync the project folder to the head node. Do I need to mannuly sync project to worker nodes? Thanks!