Better machine, worse performance

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.


I have a VDS (virtual dedicated server with 24 logical cores) and have rented a completely dedicated server (say DS) (with 64 logical cores) newly. And I’ve faced a really bad performance in my code although DS resources are way better than those of VDS. To confirm the issue, I’ve tested the simple code below:

import time
import random
import ray


def do_some_work(x):
    time.sleep(random.uniform(0, 10) / 100)
    return x

def process_incremental(sum, result):
    return sum + result

start = time.time()
result_ids = [do_some_work.remote(x) for x in range(10000)]
sum = 0
while len(result_ids):
    done_id, result_ids = ray.wait(result_ids)
    sum = process_incremental(sum, ray.get(done_id[0]))
print("duration =", time.time() - start, "\nresult = ", sum)

This code runs as expected. VDS finishes it in ~21.5 seconds. DS finishes it in ~8.5 seconds. Do you have any idea why my code (~6 sec in VDS but ~40 sec in DS) shows worse performance in better machine although the code above runs as expected? I would ask the VDS and DS provider (same company, Contabo) about this issue but once the code above runs as expected, this may be meaningless.

Thanks for the help.

One possibility is that there might be unexpected thrashing if the code you are running is using multiple threads each, so more parallelization doesn’t help and just causes more contention. Sometimes setting the environment variable OMP_NUM_THREADS=1 or similar helps with this issue.

It could also be that somehow the code is just slower on some machine (what happens if you ray.init(num_cpus=8) on both machines to run with fixed parallelism?)