Running trial is not in ready list from ray.wait()

fengxiaoxu96 · December 17, 2021, 10:08am

I am using ray.tune and rllib to train a customized gym environment defined by myself.

But the tune process can not start to train since the trial is not ready

In detailed, the ready list from ready, _ = ray.wait(shuffled_results, timeout=timeout) in tune.ray_trial_executor.RayTrialExecutor.get_next_available_trial() does not include my trial_id so trial executor can not access the environment, so training in the environment can not start.

Actually, I tried many times and only few can start to train. In most cases, the ready list doesn’t include the environment. I also tried changing the timeout in the function but it didnot solve the problem so I think time is not the matter.

I also tried the common gym environment CartPole and it is always included in the ready list so the training can start without problems.

I wonder if you have any ideas about why one environment is not ready by ray.wait.

The most low-level function called is

ready_ids, remaining_ids = worker.core_worker.wait(
            object_refs,
            num_returns,
            timeout_milliseconds,
            worker.current_task_id,
            fetch_local,
        )

in ray/worker.py

Appreciate any help about this problem!

Clark_Zinzow · January 6, 2022, 9:16pm

@fengxiaoxu96 you may want to ask this question in the #all-about-ray-tune category so the ML team will be aware of it! cc @matthewdeng

Topic		Replies	Views
Worker times out while preparing for training Ray Tune	0	305	June 29, 2021
Not fully used resources by ray tune Ray Tune	2	403	August 11, 2021
Trainable not found -- 1.9.0 Ray Tune	4	736	December 7, 2021
Change the config in tune.scheduler will call the setup function of Trainable class Ray Tune	4	368	February 27, 2023
Resume=True fails without useful error message RLlib	31	3187	September 26, 2022

Running trial is not in ready list from ray.wait()

Related topics