Hello there,
I tried training a BC algorithm using offline data and enabled the RL module in the algorithm configuration. I ran the code on Google Colab, which only provides 2 CPUs, and encountered the following error:
The following resource request cannot be scheduled right now: {‘CPU’: 1.0}.
This is likely due to all cluster resources being claimed by actors.
If I disable the RL module, the BC algorithm runs without problems. However, when the RL module is enabled, the code gets stuck due to the CPU scheduling error above.
My question is:
- How can I check which resources (CPUs, actors, etc.) are currently being scheduled or used?
- How can I identify what might be causing this scheduling issue in my code?
Would really appreciate any suggestions or debugging tips!
My Environment
- Platform: Google Colab (2 CPUs)
- Ray version: 3.0.0.dev0
- Python version: 3.10
My Code
import gymnasium as gym
from ray.rllib.algorithms.bc import BCConfig
from ray.rllib.core.rl_module.rl_module import RLModule, RLModuleSpec
from ray.rllib.core.testing.torch.bc_module import DiscreteBCTorchModule
config = (
BCConfig()
.api_stack(
enable_rl_module_and_learner=True,
enable_env_runner_and_connector_v2=True,
)
.environment(“CartPole-v1”)
.learners(num_learners=0)
.offline_data(
input_ = “/content/cartpole/large.json”,
input_read_method = “read_json”,
dataset_num_iters_per_learner=1,
)
.training(lr=0.00001, gamma=0.99, beta=0.0)
.rl_module(rl_module_spec=RLModuleSpec(module_class=DiscreteBCTorchModule))
.evaluation(
evaluation_interval=1,
evaluation_num_env_runners=1,
evaluation_duration=1,
)
)
algo = config.build()
result = algo.train()
Output Message
2025-10-17 07:46:34,505 INFO worker.py:1783 – Started a local Ray instance.
2025-10-17 07:46:43,961 INFO streaming_executor.py:108 – Starting execution of Dataset. Full logs are in /tmp/ray/session_2025-10-17_07-46-30_753240_7705/logs/ray-data
2025-10-17 07:46:43,961 INFO streaming_executor.py:109 – Execution plan of Dataset: InputDataBuffer[Input] → TaskPoolMapOperator[ReadJSON]
logger.warn(
2025-10-17 07:46:56,308 INFO trainable.py:161 – Trainable.setup took 25.799 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2025-10-17 07:46:56,309 WARNING util.py:61 – Install gputil for GPU system monitoring.
2025-10-17 07:46:56,323 INFO streaming_executor.py:108 – Starting execution of Dataset. Full logs are in /tmp/ray/session_2025-10-17_07-46-30_753240_7705/logs/ray-data
2025-10-17 07:46:56,323 INFO streaming_executor.py:109 – Execution plan of Dataset: InputDataBuffer[Input] → TaskPoolMapOperator[ReadJSON] → ActorPoolMapOperator[MapBatches(OfflinePreLearner)]
(autoscaler +52s) Tip: use ray status to view detailed cluster status. To disable these messages, set RAY_SCHEDULER_EVENTS=0.
(autoscaler +52s) Warning: The following resource request cannot be scheduled right now: {‘CPU’: 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster.
(autoscaler +1m27s) Warning: The following resource request cannot be scheduled right now: {‘CPU’: 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster.
(autoscaler +2m2s) Warning: The following resource request cannot be scheduled right now: {‘CPU’: 1.0}. This is likely due to all cluster resources being claimed by actors. Consider creating fewer actors or adding more nodes to this Ray cluster.