Future of ray.init(remote address)

shiranbi · April 27, 2023, 4:40am

None: Just asking a question out of curiosity

Hi, I understand you are encouraging people to use ray jobs submission instead of using ray.client to connect to a remote cluster.
However these are two completely different uses cases. In case I have very short lived “jobs” where the flow of the job may change according to my inputs, then the job api isn’t useful for me

I need to understand if you plan to deprecate this in the future, as if it is I will need to stop using ray.

pcmoritz · April 27, 2023, 8:09pm

Hey shiranbi,

Thanks for your message! We are not currently planning to deprecate the Ray Client, but we do encouraging people to use Ray Jobs where appropriate since the Ray Client has some downsides especially for production scenarios (e.g. library versioning between client and cluster, the need for a long running connection between client and server and the client needing to stay active).

We are interested in understanding Ray Client use cases better, if you could give us more insights into your use case (maybe with example code), we would very much appreciate it. In some cases there are better solutions that are more robust.

All the best,
Philipp.

DannyChen · May 2, 2023, 2:11am

Hi Philipp,

At least in our use case, ray.client(remote_addr) is useful when doing interactive debugging and analysis. The programmer can quickly try out different implementations of a function foo in jupyter notebook (running off the cluster), and re-run @ray.remote def foo(...): to quickly harness high degree of parallelism of a cluster and get some data analysis results within seconds. Then the programmer can try out a new idea (change a few lines in foo, remove the decorator to run it locally for a quick test (1 second, small input), and add the decorator and run it on remote (3-5 second, huge input).

Using ray job submit would require separately editing a .py file and submit the script (which might not be syntactically correct etc.), adding quite some friction and (human-perceived) latency during interactive debugging and analysis.

So, although I totally agree that long-running jobs are best suited for job submit API, please consider keeping ray.client for this interactive debugging use case!

shiranbi · May 2, 2023, 4:57am

Hi @pcmoritz
My use cases is as follows:

Right now I create a connection after I run the cluster and keep it open for days (haven’t gotten to the state where it is actually running for days but the system is supposed to run for a long time)

However every 1 minute to 1 hour (different use cases) I plan to close all my existing actors and run a new bunch of actors (so technically I can call the init each time I create the actors and close the connection afterwards if this is a better way to go)

The actors create their state during their initialization and it isn’t supposed to change during their runtime, this is used for caching more than state so calling individual tasks are not useful

In the time I have the actors running I get new data from an outside source every 1 sec.

When the data is received, I call several actors (simultaneously on many different servers but also one after another – output of one is passed as input to another. Since the additional overhead of triggering a function on an actor is in the milliseconds as far as I measured, I didn’t see a problem with that)

When the last actor is done, I get the output in the client (small data) and pass it on to the next system

As far as I understand this doesn’t fit the submit jobs use case

If I didn’t understand correctly please let me know

If you think ray isn’t fit for this use case please also let me know

yic · May 2, 2023, 7:04pm

Hi @shiranbi
IIUC, you got the pipeline like:

data ==(1)==> (Ray) ==(2)==> output ==> other system

This is organized with a driver and the driver is not run within the ray cluster, but outside of the ray cluster using ray client.

And you can’t log into your cluster and run the driver there (network setup or security issues).

Ray client gives you one way to interact with what you deployed in the cluster. (1) and (2)

For (1) and (2) if there are other ways to interact with the cluster, it might also works for you, but if the syntax is different, it’ll increase your developing cost.

Am I correct about this?

shiranbi · May 3, 2023, 2:40pm

Sorry I don’t really understand what (1) and (2) are. are they tasks needed to be done on a specific data?

first of all the way I interact with the outside world is through grpc service which I put as part of a process that interacts with the cluster through ray client. I am not sure how to run this process within the cluster

I would be interested in other ways to interact with the cluster. Different syntax doesn’t automatically increase my development cost so I would be happy to understand what it is

yic · May 3, 2023, 10:31pm

@shiranbi sorry for the confusion here.

(1) and (2) only mean how data is passed to ray and how data is passed to output.

Now it’s ray client (gRPC).

shiranbi · May 5, 2023, 5:39am

@yic any suggestions though?
on how to run a process that interacts via grpc to the outside world within the cluster
or working any other way that will still fit my use case?

yic · May 5, 2023, 8:46pm

Hi @shiranbi

Ray client is implemented based on gRPC. So whenever you use ray client, the interaction between your code and ray cluster actually is via gRPC.

There is some gaps for complicated use cases, especially when there is some context set in the code, because ray primitives are sent through gRPC, so context setup might not be there in the server. But I do think this is a convenient way to interact with ray cluster.

shiranbi · May 7, 2023, 11:40am

So for now do you suggest I keep using ray client to connect to my cluster?
If not what would you recommend?
Do you think my use case doesn’t fit ray system at all and I should stop using it?

shiranbi · May 14, 2023, 6:11am

is there an option to run a script (on its on virtual environment) inside the head node
as a part of the ray start command perhaps?
This way the client would be running as part of the cluster

Topic		Replies	Views
Ray.init not work, but ray job submit is	3	214	July 29, 2024
Options other than using ray client Ray Client	1	58	July 23, 2024
Proper pattern to use from Django Ray Core	6	164	July 4, 2024
Does Java API have something like ray.client().connect()	2	807	August 28, 2021
How to submit a job to a local_mode cluster	3	553	February 27, 2021

Future of ray.init(remote address)

Related topics