How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I am using a rollout worker to evaluate a checkpointed APEX policy.
The rollout worker is created by a custom trainable class.
The trainable class is run by the
Before updating the wheel 3.0.0dev and merging the last commit on ray’s master, everything worked fine. Now I did, my program hangs.
The code responsible for this is:
cls = get_trainable_cls("APEX") agent=cls(config=config, env="custom_env")
when I dig a bit into the ray codebase, I see it actually hangs in
def get_objects(self, object_refs, timeout=None): # some code # .... # it hangs here: data_metadata_pairs = self.core_worker.get_objects( object_refs, self.current_task_id, timeout_ms )
If I set myself the
timeout_ms, I get a timeout exception.
object_refs seems to be references to the APEX workers I used to train the checkpointed policy.
When running with
local_mode=True I don’t have the issue. So this is my workaround right now … any ideas?