High wait_for_function overhead for new tasks

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

TLDR: I have 800ms ~ 1.1s of wait_for_function overhead on each worker on an single machine, is this normal? Can I do something about it?

Hi, I’m using ray core to parallelize an algorithm for Machine Learning explainability that we are developing. The basic concept is to recursively generate a binary tree that splits sampled data into two subdomains at each node, to finally obtain an approximation of the actual black box model.

In the current implementation I’m simply using ray tasks to handle each new sub node of the tree and computing the best split point for each dimension on each node in parallel.

I managed to avoid blocking the tasks for the recursive tree by simply creating a tree of remote references and resolving the references only later through an async recursive function.

I tested the code on my desktop machine (8 core /16 threads, AMD Ryzen 3700x, windows) and was able to observe how it required pretty substantial workloads to offer a speedup, even though I could see CPU utilization being pretty high.

I used ray timeline with RAY_PROFILING=1 and generated the following graph:

From the graph I noticed how on each worker, when it first executes a tasks it needs to wait_for_function for a substantial amount of time, from 700ms to 1.1s.

Looking at the ray code I understood that basically this time is spent sending the function to the worker so that it may execute it, not the parameters though, just the function.

Now my first question is: is this overhead normal and expected or is this is something specific to our code?

Second question: Is there a way to send the functions to the workers eagerly to avoid having the overhead during the first calls to that worker?

I can see how each worker only needs to wait on the function once and after that is quite efficient.

Basically I would like to know if there is something I can try to do to reduce the wait_for_function overhead or if I can workaround it by sending the function eagerly at the start.

@Al12rs one possibility of this slowdown is the python import time, as 700ms ~ 1seconds is a bit too high. Another unlikely possibility is due to the serialization/deserialization of your remote function.

Is it possible for you to measure the python package import time without Ray? That could help us optimize your worker init speed.

If it’s indeed due to python package import, we are looking into the problem in Ray 2.2.

cc @sangcho as well.

Hi!

I’m not completely sure what exactly I should try measuring. The time required to import all the imports mentioned in the files where the ray.remote functions and sub functions are located (ray excluded)? Or exclusively the imports that are actually used within the remote code?

In case it’s all the imports mentioned in the file I tried measuring it like this:

import time
start = time.time()
import os
import pickle
import warnings
import zipfile

import numpy as np

from brownian_bridge_utils import covariance_matrix
from classes import Node, OrderedSplit
from split_criterions import (compute_loglik_process,
                              compute_mse_process_online, compute_naive_r2,
                              compute_online_r2)

end = time.time()
print(end - start)

which printed: 0.7073643207550049
(time was not an import we actually used)

Is that measurement correct and what you where referring to? If so I can see how it could be contributing to the slowdown. I had assumed only the actually used things would be imported in the remote worker, but maybe that is not the case and using separate files with less imports could be beneficial?

Could the fact that we are calling ray remote functions from within the remote function be aggravating the situation in any way?

I really appreciate the help btw!

1 Like

@Chen_Shen Courtesy ping in case it was expected to notify you of the reply, not sure if that is the etiquette here.

Yeah it seems in your case the import time is the main overhead.

@Chen_Shen thanks for the confirmation!

You mentioned that Ray2.2 aims to reduce this overhead, could you tell me some more info on that?
Like what is your approach going to be in trying to mitigate this?

If we know that future versions of Ray will be able to address this, we could avoid having to go to extremes to try to workaround it.

hey, @Al12rs we will add a bit of details to [Core] Ray worker slow initialization · Issue #28789 · ray-project/ray · GitHub soon!

1 Like