Policy_Server num_rollout_workers>0

Denys_Ashikhin · March 30, 2023, 3:46pm

How severe does this issue affect your experience of using Ray?

Low: It annoys or frustrates me for a moment.

Hi all,

I have set up a working policy_server + policy_client training workflow with 1 server serving ~6 clients with inference_mode=remote. However, I am noticing that the server at times struggles to serve all the incoming requests. The cpu + gpu is around 25%-35% with less than half of the gpu vram utilised.

The server uses num_rollout_workers=0 because a long time ago (years) I had issues with setting the value >0.
So my question is what happens when I set it to 2? Will it create 2 copies of my model on the server’s gpu (assuming it can handle it) and load-distribute incoming GET_ACTION requests amongst the two workers?

P.S.
From Policy_Server example:

 if ioctx.worker_index > 0 or ioctx.worker.num_workers == 0:
            return PolicyServerInput(
                ioctx,
                SERVER_ADDRESS,
                args.port + ioctx.worker_index - (1 if ioctx.worker_index > 0 else 0),
            )

Or will I have to manually specify the ip+port for each client for which worker to use? In this case, the main policy_server doesn’t load balance, but instead communincates with each work to get the data batches from clients?

Topic		Replies	Views
Num_gpu, rollout_workers, learner_workers, evaluation_workers purpose + resource allocation Configure Algorithm, Training, Evaluation, Scaling	8	1979	August 24, 2023
[Rllib] Proper number for PPO rollout workers RLlib	2	1651	August 4, 2022
Total Workers == (Number of GPUS) - 1? Configure Algorithm, Training, Evaluation, Scaling	1	1121	February 9, 2023
Not sure how num_replicas works Ray Serve	5	1637	March 4, 2021
[Low] How to increase the number of workers for a controller or proxy service? Ray Serve	2	284	December 5, 2023

Policy_Server num_rollout_workers>0

Related topics