I’ve been deploying RayServices to a cluster with KubeRay installed. I initially configured the node group where ray head nodes are scheduled on to be quite minimal (nodes with 2 vCPU, 4Gi memory, and 30Gi storage). However, I found out that the head node is responsible for pulling and extracting the docker image for the worker nodes. Thus, apart from enough storage required for the docker image of the head node, the head node needs to have enough additional storage for the docker image of the worker nodes too.
I couldn’t find a description of what a head node does in terms of control and coordination of worker nodes in the Ray documentation (and thus couldn’t infer what compute requirements are needed for head nodes). Is this process documented somewhere? If not, what are other things I should take into consideration when determining the compute requirements for the pods of head nodes?