Ray.put from Ray task - local or not

Gil_Vernik · February 23, 2021, 10:10am

Assume my code deployed as a Ray task, contains statement ray.put('abc') . Does “abc” persisted in the Plasma at the same worker node that my task is running or ray.put('abc') directed to the cluster head node that decides which Plasma to use in the entire cluster and then most likely will be persisted on the other worker?

sangcho · February 23, 2021, 5:51pm

Plasma store is “local only”. So objects are only created locally, and they are transferred across the cluster when they are needed by others. Note there could be many copies of the same objects. That says;

ray.put → object is created in the local plasma store where the caller of ray.put exists.
task's return object → object is created in the local plasma store where the task is executed.

Gil_Vernik · February 23, 2021, 5:58pm

So if task creates some objects that stored in local Plasma, does Global Catalog Store also contains pointers to those objects? If yes, does local ray.put also update global catalog?

sangcho · February 23, 2021, 7:17pm

Yes. Global catalog is updated by raylet after the object is created.

Gil_Vernik · February 23, 2021, 7:54pm

thanks a lot for the information. Assume I have a Ray cluster with 2 workers. If i write a program and run it on my laptop

import ray

ray.init(IP of my Ray cluster)
id = ray.put(mydata)

This code is not executed on the worker, right? Does it means, in this case, ray.put() will get to HEAD node, that will choose arbitrary Plasma on one of the workers to store mydata?

sangcho · February 23, 2021, 8:12pm

Ray head node is identical to worker nodes except that it has a couple more important components to manage the cluster. So, it will be just stored in a head node plasma store.

Gil_Vernik · March 4, 2021, 6:09am

If i want a task to write data into local plasma on the worker node where the task is executed. Is this correct code?

@ray.remote
def my_func(val)
  return ray.put(val)

ray.init()
future = my_func.remote('my-test-data")
object_id = ray.get(future)

Now object_id contains ID of the object that was created in the local plasma where my_func executed. Is this correct?

sangcho · March 4, 2021, 6:23am

Yes, you are right! Btw, calling it object_ref is more accurate there. So your driver now will have a ref to the plasma object in a worker node, and it is only pulled to the driver node only when you call ray.get on the ref!

@ray.remote
def my_func(val)
  return ray.put(val) # Put the object on a node this task runs on.

ray.init()
future = my_func.remote('my-test-data") is 
object_ref = ray.get(future) # you only have a reference to an object in a worker node now.
result = ray.get(object_ref) # This will start pulling the object from the worker node to a driver node.

Gil_Vernik · March 14, 2021, 2:43pm

@sangcho In the context above…

@ray.remote
def my_func(val)
   return ray.put(val) # Put the object on a node this task runs on.

ray.init()
future = my_func.remote('my-test-data")

my_func.remote() submitted to the local node where this code executed. If local node get fully utilized, does auto scaller will submit my_func to different nodes?

sangcho · March 15, 2021, 5:47pm

Functions are submitted to remote nodes if the local node resources are utilized (e.g., num_cpus). But if the local node cannot create an object because the plasma store is full, it will just raise OOM. From ray 1.3 (which will be released in 2~3 weeks), it will use the object spilling by default, so objects will be spilled to disks if the local node is not available (Memory Management — Ray v1.2.0).

Also note that autoscaler doesn’t submit functions to other nodes. It is what raylet is doing, and Ray doesn’t have a centralized scheduler (it uses the decentralized protocol for scalability). You can checkout white paper for more internal details about Ray! Ray Whitepapers — Ray v1.2.0

Topic		Replies	Views
Plasma usage across Nodes Ray Serve	2	746	March 8, 2022
Local object store on worker nodes not working, worker plasma stays at 0% Ray Core	8	1520	August 31, 2022
Putting objects to plasma Ray Core	4	380	February 2, 2021
At what size will objects no longer use local storage Ray Core	5	428	February 19, 2022
Question - how to get owner address from ObjectRef Ray Client	12	955	July 7, 2021

Ray.put from Ray task - local or not

Related topics