Hey, I have been trying to use Ray for some things in my project. I’m trying to process data from three sources at the same time. I made it to the point of running methods that receive objects but sadly my object uses s3 functionalities to download files and TypeError: can't pickle SSLContext objects
error appears.
This is the code I’m using for that (this will raise the error when I use an s3 instance)
ray.init()
print(ray.available_resources())
@ray.remote
def f1(object_, x):
time.sleep(5)
return [i for i in range(100*x)]
@ray.remote
def f2(object_, x):
time.sleep(5)
return [i for i in range(100*x)]
@ray.remote
def f3(object_, x):
time.sleep(5)
return [i for i in range(100*x)]
class test:
name = "hey"
apellido = "bye"
@ray.remote
def f(self, x):
return x * x
def run_multiprocess(self):
futures = [f1.remote(self, 20), f2.remote(self, 10), f3.remote(self, 5)]
a, b, c = ray.get(futures) # [0, 1, 4, 9]
t = test()
t.run_multiprocess()
This code executes in 9 seconds (kind of expected). I don’t know why this took 4 seconds to set up everything.
After this I tried another approach, using actor methods
ray.init()
@ray.remote
class TrainInputs (object):
@ray.method(num_returns=1)
def f1(self, x):
time.sleep(5)
return [i for i in range(100*x)]
@ray.method(num_returns=1)
def f2(self, x):
time.sleep(5)
return [i for i in range(100*x)]
@ray.method(num_returns=1)
def f3(self, x):
time.sleep(5)
return [i for i in range(100*x)]
class Train():
t = TrainInputs.remote()
def get_inputs_train(self):
print(ray.available_resources())
futures = [self.t.f1.remote(20), self.t.f2.remote(10), self.t.f3.remote(5)]
a, b, c = ray.get(futures)
train = Train()
train.get_inputs_train()
ray.shutdown()
But the execution lasts 19.7 seconds which means we don’t see any improvements.
I also tested async approach but since we are not working with any additional library (all this is manual) the gil lock will not be bypassed.
Any ideas? am I doing anything wrong?.