I’m trying to create a detached actor so that I can use it in another driver script. This is still local testing. This code works inconsistently(it fails sometimes) and I don’t know how to reproduce a consistent success or failure
I expected the following code to work?
# tf1/main.py
import tensorflow as tf
import time
DEPLOY_TIME = time.time()
class Predictor:
def __init__(self):
pass
def work(self):
return tf.__version__ + f"|{DEPLOY_TIME}"
pass
# ray_url = "ray://localhost:10002"
if __name__ == "__main__":
print("Deploy Time:" + str(DEPLOY_TIME))
import ray
with ray.init(namespace='indexing'):
try:
old = ray.get_actor("tf1")
print("Killing TF1")
ray.kill(old)
except ValueError:
print("Not Killing TF1 as it's not present")
PredictorActor = ray.remote(Predictor)
PredictorActor.options(name="tf1", lifetime="detached").remote()
If I add the below three lines at the end, it works consistently.
a = ray.get_actor("tf1")
print("Named Actor Call")
print(ray.get(a.work.remote()))
I’m calling the above code in another driver script
# indexing/main.py
import ray
ray.init(namespace="indexing")
print("Ray Namespace")
print(ray.get_runtime_context().namespace)
print("In Pipeline Indexing Both")
a = ray.get_actor("tf1")
print(ray.get(a.work.remote()))
a = ray.get_actor("tf2")
print(ray.get(a.work.remote()))
My run script
# indexing/run.sh
cd /home/rajiv/Documents/dev/bht/wdml/steps/tf1 &&
source ./venv/bin/activate &&
ray job submit --runtime-env-json='{"working_dir": "./", "pip": ["tensorflow==1.15"], "excludes": ["venv"]}' -- python main.py &&
cd /home/rajiv/Documents/dev/bht/wdml/pipelines/indexing &&
source /home/rajiv/venvs/indexing/bin/activate &&
ray job submit --runtime-env-json='{"working_dir": "./", "pip": []}' -- python main.py
The error I get is
Traceback (most recent call last):
File "main.py", line 10, in <module>
a = ray.get_actor("tf1")
File "/home/rajiv/venvs/tf2/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/rajiv/venvs/tf2/lib/python3.7/site-packages/ray/worker.py", line 2031, in get_actor
return worker.core_worker.get_named_actor_handle(name, namespace or "")
File "python/ray/_raylet.pyx", line 1875, in ray._raylet.CoreWorker.get_named_actor_handle
File "python/ray/_raylet.pyx", line 171, in ray._raylet.check_status
ValueError: Failed to look up actor with name 'tf1'. This could because 1. You are trying to look up a named actor you didn't create. 2. The named actor died. 3. You did not use a namespace matching the namespace of the actor.
Details:
- Ray 1.12 Default
- All code is submitted via the ray job api