1. Severity of the issue: (select one)
Medium: Significantly affects my productivity but can find a workaround.
2. Environment:
- Ray version: 2.44.0
- Python version: 3.12
- OS: ubuntu 22.04
- Cloud/Infrastructure: on prem
- Other libs/tools (if relevant):
3. What happened vs. what you expected:
- Expected: ray.wait with fetch_local=False shouldn’t bring data into ray head
- Actual: i saw the object store of the ray head being filled
I am running ray head + 1 ray node as following:
ray start --head --port=9654 --num-cpus=0 --block --dashboard-host=0.0.0.0 --object-store-memory=4294967296
ray start --address=172.17.0.5:9654 --num-cpus=0 --block --resources=‘{“Foo”:7}’ --object-store-memory=4294967296
i am connecting to the cluster from outside as follows:
context = ray.init(address=“ray://172.17.0.5:10001”)
I have an actor with the following function
@ray.method(num_returns=3)
def run_task_return_big_nd_array(self) → tuple[bool, NDArray, float]:
big_array = np.zeros((10000, 10000), dtype=np.uint16)
end_time = time.time()
done = True
return done, big_array, end_time
The test code is as follows:
ray_actor = MaximizerRayActor.options(resources={“Foo”: 1}).remote()
ray_actor.wait.remote()
refs = []
for i in range(test_loops):
done, res_ref, time_ref = ray_actor.run_task_return_big_nd_array.remote()
ready, not_ready = ray.wait([done, res_ref, time_ref], num_returns=3, fetch_local=False)
#d = ray.get(done)
refs.append([done, res_ref, time_ref])
time.sleep(10)
ray.kill(ray_actor)
if the test code is like this, i see the object store of the ray head increasing (as well as the worker) but fetch_local equals false in ray.wait, so i don’t understand why
if i remove the ray.wait and put d = ray.get(done), the ray.head object store doesn’t increase and i still know that the task is complete.
How can i use the ray.wait to understand the task is complete without bringing over data?