GCP custom machine type and best practices for node choice

Lars_Simon_Zehnder · January 18, 2022, 10:05pm

Hi everyone,

I have a question regarding the cluster config example for GCP. The reason is that I run into memory problems when running my RLlib training jobs.

I see in the ray_head_gpu node description:

node_config:
     machineType: custom-6-16384

This custom machine type contains, I guess 6 cores and 16GB memory? Is this the general way to tell compute engine how to put together my virtual machine - like when I write custom-6-32768 I get double the memory?

Are there any guidelines or best practices as to how to choose the memory and core resources on head and worker nodes?

Thanks in advance!
Simon

ckw017 · January 19, 2022, 12:56am

Your method of selecting cores/memory is correct, the fields under node_config are based off of this API: Method: instances.insert | Compute Engine Documentation | Google Cloud

To create a custom machine type, provide a URL to a machine type in the following format, where CPUS is 1 or an even number up to 32 (2, 4, 6, … 24, etc), and MEMORY is the total memory for this instance. Memory must be a multiple of 256 MB and must be supplied in MB (e.g. 5 GB of memory is 5120 MB):

zones/zone/machineTypes/custom-CPUS-MEMORY

Note that you’ll also want to update this line to match your CPU count.

Resources needs will be dependent on the application.

Lars_Simon_Zehnder · January 19, 2022, 2:35pm

Hi @ckw017 ,

thank you for the quick reply and the worthful links. I read on GCP about custom machine types and what I could not find was: any information about the specifics of the naming (custom-cpu-memory). Where is said that this is the way how the custom machine has to be named. If I write e.g. this-is-some-machine I guess I get an error as either a custom type has to be provided or a standard machine type.

Second: My question was ill formulated. What I wanted to know in my second question was, if there are some guidelines or best practices in regard to choosing node sizes for a ray cluster (head/worker)?

ckw017 · January 19, 2022, 6:31pm

For the first question, the details for custom node naming should be on the page I linked here. I’ve included a screenshot of the section detailing it (try just searching for machineType on the page)

We have some advice for picking node types here. The node types are specific to AWS, but should have equivalents on GCP. The " How many CPUs/GPUs?" should be useful here. In particular, Ray Dashboard can give you an idea of what resources your workload is consuming or bottlenecking on

Lars_Simon_Zehnder · January 19, 2022, 8:22pm

Hi @ckw017 ,

my fault. I did follow the link, but was unsure about what to look for. Thank you for coming back to this and thank you for the info about ray cluster nodes.

Topic		Replies	Views
Multiple GPU head node on GCP Ray Clusters	3	564	April 25, 2022
Logging in to GCP custom docker image Ray Clusters	0	215	February 17, 2024
How to assign different custom resources for each worker nodes? Ray Clusters	9	2174	July 28, 2022
Troubles setting up a Ray Cluster on the Google Cloud Platform (GCP) Ray Core	2	552	March 3, 2021
On-premise cluster: different worker node types Ray Clusters	5	866	June 16, 2023

GCP custom machine type and best practices for node choice

Related topics