Apache Ray Yarn Multiple Clusters

abhishekkunal · March 8, 2022, 6:42pm

I am trying to start a Apache Ray cluster on yarn. As per the documents, to be able to run a Ray application on a given yarn cluster, following command needs to be used-

skein application submit [cluster_configuration.yaml]

Where cluster_configuration.yaml has Ray cluster’s specification to be created by skein in yarn.

For a Ray cluster to work there are different port numbers which can be specified via configuration, some examples are, --node-manager-port, --object-manager-port etc.

My question is, if multiple users are trying to run their Ray applications, and if they happened to specify the same port numbers for these ports, then will this create a situation of port conflict? If one hadn’t specified the manual port numbers, Ray would have tried to use available port numbers randomly, but in case users do specify explicitly , then how Ray is going to handle this situation?

sangcho · March 9, 2022, 12:43pm

If multiple users share the same cluster, there should be no issue regarding port conflict. IIUC, skein application submit [cluster_configuration.yaml] → this doesn’t create a new cluster right?

abhishekkunal · March 10, 2022, 5:45am

I am not sure, could you please suggest what will be a possible yaml for using an existing Ray cluster launched on Yarn, as I coudln’t find any documentation for the same.

The sample yaml file clearly suggests that, a new Ray cluster will be created when we submit this command.

For example, the following script is for the head node, which is executing ray start command

script: |
            # Activate the packaged conda environment
            #  - source environment/bin/activate

            # This stores the Ray head address in the Skein key-value store so that the workers can retrieve it later.
            skein kv put current --key=RAY_HEAD_ADDRESS --value=$(hostname -i)

            # This command starts all the processes needed on the ray head node.
            # By default, we set object store memory and heap memory to roughly 200 MB. This is conservative
            # and should be set according to application needs.
            #
            ray start --head --port=6379 --object-store-memory=200000000 --memory 200000000 --num-cpus=1

            # This executes the user script.
            python example.py

            # After the user script has executed, all started processes should also die.
            ray stop
            skein application shutdown current

Topic		Replies	Views
Issues running ray on yarn with skein Ray Clusters	0	27	March 18, 2024
Ray on Yarn (MapR - failing to get RAY_HEAD_ADDRESS) Ray Clusters	1	328	July 7, 2021
Some Issues When I Start My Ray Cluster in centos 7 Ray Clusters	4	599	January 28, 2022
Starting ray head node on the same host with different ports does not start up Ray Clusters	1	446	May 26, 2021
Ray cluster uses only Head node Ray Clusters	3	440	June 28, 2021

Apache Ray Yarn Multiple Clusters

Related topics