Hi, I’m running ray on a compute node with 128 cores directly and met some problems of setting num_cpus
.
Example
import ray
import time
@ray.remote(num_cpus=50)
def test_ray(num):
print(num)
time.sleep(5)
if __name__ == '__main__':
futures = [test_ray.remote(i) for i in range(200)]
ray.get(futures)
Computer info
$ cat /proc/meminfo |grep MemTotal
MemTotal: 528083404 kB
$ nproc --all
128
$ uname -a
Linux node1 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Problems
When I don’t set num_cpus
, it works well:
(test_ray pid=139395) 107
(test_ray pid=139397) 106
(test_ray pid=139405) 105
(test_ray pid=139398) 104
(test_ray pid=139399) 103
(test_ray pid=139404) 102
(test_ray pid=139409) 101
(test_ray pid=139402) 99
(test_ray pid=139400) 100
(test_ray pid=139401) 98
(test_ray pid=139406) 97
(test_ray pid=139407) 96
(test_ray pid=139408) 95
(test_ray pid=139411) 93
(test_ray pid=139410) 94
(test_ray pid=139413) 92
(test_ray pid=139412) 91
(test_ray pid=139415) 90
(test_ray pid=139422) 89
....
...
When I set num_cpus=50
, it only use two cores:
(test_ray pid=142874) 0
(test_ray pid=142872) 1
(test_ray pid=142874) 3
(test_ray pid=142872) 2
....
When I increase it to 100, that would be running one by one.
Is there any principle of setting num_cpus
?