Setting up ray::cluster on AWS for ray::rlllib

Hi there,

i’d like to create an IaC script to run rllib trainings on AWS.
To do this I’m using ray::clusters module to create setup for the head and workers creation.
Having in mind that I’d need GPU only on the head (where the optimizer is running) and only CPU on workers, what would be the best choice for instance types and image ids (AMIs) for head and workers?

For now my setup is:

  • HEAD
    • InstanceType: g4dn.xlarge
    • ImageId: Deep Learning AMI GPU CUDA 11.3.1 (Ubuntu 20.04)
  • WORKERS
    • InstanceType: c5.xlarge
    • ImageId: Ubuntu Server 20.04 LTS (HVM), SSD

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.