How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
The setup : Ray 1.13 image is used for running the experiment.
I ran the client server for cartpole given in examples for serving.
Firstly I ran with a single client against the server. the results are -
1 client 1 server →
== Status ==
Current time: 2023-05-03 07:37:58 (running for 00:02:33.03)
Memory usage on this node: 6.0/15.4 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/12 CPUs, 0/0 GPUs, 0.0/8.61 GiB heap, 0.0/4.3 GiB objects
Result logdir: /home/ray/ray_results/PPO
Number of trials: 1/1 (1 TERMINATED)
±---------------------±-----------±----------------±-------±-----------------±------±---------±---------------------±---------------------±-------------------+
| Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
|----------------------±-----------±----------------±-------±-----------------±------±---------±---------------------±---------------------±-------------------|
| PPO_None_bdb63_00000 | TERMINATED | 172.17.0.3:8687 | 13 | 145.387 | 52000 | 190.71 | 200 | 31 | 190.71 |
±---------------------±-----------±----------------±-------±-----------------±------±---------±---------------------±---------------------±-------------------+
2023-05-03 07:37:58,784 INFO tune.py:747 – Total run time: 154.03 seconds (152.99 seconds for the tuning loop).
Then I ran 4 clients against 1 server: →
== Status ==
Current time: 2023-05-03 08:18:49 (running for 00:32:26.40)
Memory usage on this node: 9.5/15.4 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/12 CPUs, 0/0 GPUs, 0.0/8.6 GiB heap, 0.0/4.3 GiB objects
Result logdir: /home/ray/ray_results/PPO
Number of trials: 1/1 (1 TERMINATED)
±---------------------±-----------±-----------------±-------±-----------------±-------±---------±---------------------±---------------------±-------------------+
| Trial name | status | loc | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean |
|----------------------±-----------±-----------------±-------±-----------------±-------±---------±---------------------±---------------------±-------------------|
| PPO_None_45c17_00000 | TERMINATED | 172.17.0.3:18067 | 125 | 1926.7 | 500000 | 164.667 | 200 | 15 | 164.667 |
±---------------------±-----------±-----------------±-------±-----------------±-------±---------±---------------------±---------------------±-------------------+
2023-05-03 08:18:49,574 INFO tune.py:747 – Total run time: 1947.04 seconds (1946.36 seconds for the tuning loop).
For both runs, the --stop-reward was 190,
I observe that it took
1. for 1 client → Total run time: 154.03 seconds (152.99 seconds for the tuning loop)
2. for 4 clients → Total run time: 1947.04 seconds (1946.36 seconds for the tuning loop)
In conclusion, it took 12 times time longer to learn with 4 clients, to converge to stop-reward 190,
- Why is running multiple clients taking longer?
- having more sample batches sent via multiple clients parallelly should in theory reduce train time? right?
- scaling training via, multiple workers in basis for rllib and ray , assuming that it is counterintuitive that more clients is not reducing train time.
Thanks, any help is appreciated.