Ray on multiple company collaborations

alexanderzjs · February 10, 2022, 1:34am

Hello, I am new to Ray and seek for a solution in the following scenario and want to see if Ray would be a great fit for this scenario.

Suppose there are multiple companies, say A and B (in reality, there could be more companies). They want to collaboratively compute something with there own data. For example, company A has input x_A and y_A and B has input u_B, v_B, and they are going to compute f(x_A, y_A, u_B, v_B) = (x_A + u_B) * (y_A + v_B).

According to slack discussion, there are two possible options for this problem using Ray cluster (Thanks to Ray Lover and Will Drevo for the suggestions).

They build a common Ray cluster in cloud service provider such as AWS, Azure. Then, both companies use Ray client to pass in their data to this common cluster for further computation.
Each of them has its own Ray cluster locally. Then, each of them can use its own Ray client to remotely access the other’s Ray cluster.

Both of them should work and I would like to see more options for this scenario if any.

Thanks in advance!
Alexander

yic · February 10, 2022, 7:34pm

Do you mind telling me the benefit of doing this instead of just putting data someplace and pulling it in the script.

alexanderzjs · February 11, 2022, 1:46am

Hi, @yic, the actual requirement is data privacy, company’s sensitive data cannot be outsourced to someplace, even a third party due to law/regulation, for computation.

bentay · February 11, 2022, 2:47pm

can the data from company A be shared with company B and vice versa? or must the computation be completed in a privacy preserving fashion, i.e. A never gets to see B’s data in clear and B never gets to see A’s data in clear?

yic · February 11, 2022, 8:34pm

Thanks @alexanderzjs the requirement makes sense.

Maybe I misunderstood something, but I guess you want A, B to upload their data to someplace and they can’t see data the others uploaded. Once everything is ready, the system will just run the code. Is this the pattern you want to achieve?

If this is true, maybe

They build a common Ray cluster in cloud service provider such as AWS, Azure. Then, both companies use Ray client to pass in their data to this common cluster for further computation.

will work.

For example, you can have a detached actor there, and it perhaps have API like:

class Algo:

   def upload(self, param, data):
        self._params.append([param, data])
        if everything_is_ready(self._params):
            self._result = f(self._params)
  def get_result(self):
       return self._result

But still, to prevent the data leak, you probably need other things like the long-running actor are running the right code and also there is no easy way to log into the cluster and hack the data in the object-store.

alexanderzjs · February 12, 2022, 1:32am

In my case, it should be the second case, data from either company should never be shared with the other and the computation should be in privacy preserving format.

alexanderzjs · February 12, 2022, 1:47am

Well, as I have explained in the previous reply to @bentay , all computations should be in privacy-preserving fashion. (The security assumption here is: each company trusts only itself and they do not trust third party since the third party may collude with one of the company.)

In this way, what they can do is to encrypt data/model parameters and send to the other party’s Ray cluster to do computation over encrypted data/model parameters. Therefore, I am seeking a way to do this.

For example, assume company A has x=10 and company B has y=20 (suppose x and y are already in encrypted form for simplicity). Company A has a Ray cluster and has already put x in its object store:

Now, company B need to retrieve x and compute x*y, so, B uses a Ray client to connect to A’s cluster and need to get x from Object Store like the following:

Not sure how it can be realized since B cannot directly lookup the object store for x_ref.

yic · February 12, 2022, 5:12am

For this case, I think you need to store it in a detached actor with a name (a dict?), and B uses that actor to retrieve the object.

alexanderzjs · February 12, 2022, 9:42am

Would you mind to give a very concise code snippet to show it?

yic · February 14, 2022, 7:52pm

Should what I mention above work (class Algo)?

jovany-wang · February 15, 2023, 5:22pm

Hi @alexanderzjs , sorry for the delay reply.

your case is a basic MPC algorithm or a basic federated learning scenario.

I believe what you’re concerning is the complex and uncontrollable Ray protocols, like B is able to retrieve A’s data by using one of many Ray APIs(ray.get(), f.remote(), and more).

We had proposed rayfed , which is a connector layer to let user build federated learning or privacy-preserving computing applications on the top of Ray.
Also you could click our initial proposal page for more details on how we preserve privacy in Ray.

alexanderzjs · February 21, 2023, 2:12am

Hi, @jovany-wang, thanks for your reply.

Yes, I was looking for something that could integrate federated learning into Ray and I am happy to learn you guys have developed such framwork. As far as I know, there is another project GitHub - secretflow/secretflow: A unified framework for privacy-preserving data analysis and machine learning, which integrates MPC into ray to support privacy preserving applications.

Both are great and I appreciate your great contributions.

jovany-wang · February 21, 2023, 7:24am

Yes. Now the Secretflow is built on RayFed.

Topic		Replies	Views
Ray and Microservices	1	656	March 10, 2023
Share the Ray cluster Ray Core	4	572	February 8, 2021
Approaches to managing multi-user ray clusters Ray Clusters	5	1691	February 19, 2024
Letting remote function use all CPUs? Ray Core	9	540	March 10, 2021
Multiple Ray instances on one node accessing shared memory Ray Core	2	958	November 30, 2022

Ray on multiple company collaborations

Related topics