Understand when / why an actor didn't manage to be created

  • Low: It annoys or frustrates me for a moment.

We put another decorator on our ray actor and because of that our actor wasn’t able to load. that part is fine. i guess we can’t add more decorators. the problem was we couldn’t understand that our actor wasn’t even being loaded and why from the logs (we just new that it wasn’t running)

this is the logs we see in the case where everything is good

[2023-01-05 11:11:32,426 W 24 24] (gcs_server) gcs_actor_manager.cc:411: Actor with name ‘RegionPixelCruncher1_0-0’ was not found.
[2023-01-05 11:11:32,426 W 274 321] (python-core-driver-01000000ffffffffffffffffffffffffffffffffffffffffffffffff) actor_manager.cc:112: Failed to look up actor with name ‘RegionPixelCruncher1_0-0’. This could because 1. You are trying to look up a named actor you didn’t create. 2. The named actor died. 3. You did not use a namespace matching the namespace of the actor.
[2023-01-05 11:11:32,427 I 274 340] (python-core-driver-01000000ffffffffffffffffffffffffffffffffffffffffffffffff) actor_manager.cc:214: received notification on actor, state: ALIVE, actor_id: 754b704e30e549d02219611f01000000, ip address: 10.244.0.114, port: 10002, worker_id:

and in case of our added decorator

[2023-01-05 11:13:13,362 W 24 24] (gcs_server) gcs_actor_manager.cc:411: Actor with name ‘Maximizer0_0-0’ was not found.

[2023-01-05 11:13:13,362 W 274 321] (python-core-driver-01000000ffffffffffffffffffffffffffffffffffffffffffffffff) actor_manager.cc:112: Failed to look up actor with name ‘Maximizer0_0-0’. This could because 1. You are trying to look up a named actor you didn’t create. 2. The named actor died. 3. You did not use a namespace matching the namespace of the actor.

[2023-01-05 11:13:14,364 I 440 470] (python-core-worker-05f7c42247cd6b96780385d61291854bdca0e8986fb119de07267634) core_worker.cc:648: Exit signal received, this process will exit after all outstanding tasks have finished, exit_type=INTENDED_SYSTEM_EXIT, detail=Worker exits because it was idle (it doesn’t have objects it owns while no task or actor has been scheduled) for a long time.

i am trying to understand how I am supposed to get to the root of issues like that.

thanks

Hmm you’re right, and I think we can improve the observability experience here. Could you provide some minimal code example that reproduces the issue?