In client server cartpole example, multiple client do NOT speed up training

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

The setup : Ray 1.13 image is used for running the experiment.

I ran the client server for cartpole given in examples for serving.
Firstly I ran with a single client against the server. the results are -
1 client 1 server →
== Status ==
Current time: 2023-05-03 07:37:58 (running for 00:02:33.03)
Memory usage on this node: 6.0/15.4 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/12 CPUs, 0/0 GPUs, 0.0/8.61 GiB heap, 0.0/4.3 GiB objects
Result logdir: /home/ray/ray_results/PPO
Number of trials: 1/1 (1 TERMINATED)
±---------------------±-----------±----------------±-------±-----------------±------±---------±---------------------±---------------------±-------------------+
| Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
|----------------------±-----------±----------------±-------±-----------------±------±---------±---------------------±---------------------±-------------------|
| PPO_None_bdb63_00000 | TERMINATED | 172.17.0.3:8687 | 13 | 145.387 | 52000 | 190.71 | 200 | 31 | 190.71 |
±---------------------±-----------±----------------±-------±-----------------±------±---------±---------------------±---------------------±-------------------+

2023-05-03 07:37:58,784 INFO tune.py:747 – Total run time: 154.03 seconds (152.99 seconds for the tuning loop).

Then I ran 4 clients against 1 server: →
== Status ==
Current time: 2023-05-03 08:18:49 (running for 00:32:26.40)
Memory usage on this node: 9.5/15.4 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/12 CPUs, 0/0 GPUs, 0.0/8.6 GiB heap, 0.0/4.3 GiB objects
Result logdir: /home/ray/ray_results/PPO
Number of trials: 1/1 (1 TERMINATED)
±---------------------±-----------±-----------------±-------±-----------------±-------±---------±---------------------±---------------------±-------------------+
| Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
|----------------------±-----------±-----------------±-------±-----------------±-------±---------±---------------------±---------------------±-------------------|
| PPO_None_45c17_00000 | TERMINATED | 172.17.0.3:18067 | 125 | 1926.7 | 500000 | 164.667 | 200 | 15 | 164.667 |
±---------------------±-----------±-----------------±-------±-----------------±-------±---------±---------------------±---------------------±-------------------+

2023-05-03 08:18:49,574 INFO tune.py:747 – Total run time: 1947.04 seconds (1946.36 seconds for the tuning loop).


For both runs, the --stop-reward was 190,

I observe that it took
1. for 1 client → Total run time: 154.03 seconds (152.99 seconds for the tuning loop)
2. for 4 clients → Total run time: 1947.04 seconds (1946.36 seconds for the tuning loop)

In conclusion, it took 12 times time longer to learn with 4 clients, to converge to stop-reward 190,

  • Why is running multiple clients taking longer?
  • having more sample batches sent via multiple clients parallelly should in theory reduce train time? right?
  • scaling training via, multiple workers in basis for rllib and ray , assuming that it is counterintuitive that more clients is not reducing train time.

Thanks, any help is appreciated.

RAY team, any update on this is much appreciated. This issue is road block for us.