How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
Hi all,
I have set up a working policy_server + policy_client training workflow with 1 server serving ~6 clients with inference_mode=remote
. However, I am noticing that the server at times struggles to serve all the incoming requests. The cpu + gpu is around 25%-35% with less than half of the gpu vram utilised.
The server uses num_rollout_workers=0
because a long time ago (years) I had issues with setting the value >0.
So my question is what happens when I set it to 2? Will it create 2 copies of my model on the server’s gpu (assuming it can handle it) and load-distribute incoming GET_ACTION requests amongst the two workers?
P.S.
From Policy_Server example:
if ioctx.worker_index > 0 or ioctx.worker.num_workers == 0:
return PolicyServerInput(
ioctx,
SERVER_ADDRESS,
args.port + ioctx.worker_index - (1 if ioctx.worker_index > 0 else 0),
)
Or will I have to manually specify the ip+port for each client for which worker
to use? In this case, the main policy_server
doesn’t load balance, but instead communincates with each work to get the data batches from clients?