How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi. I have been using & enhancing a single-agent RL solution with custom NN and custom environment using PPO for several months, with lots of success using the legacy RLlib API on Ray 2.40. This week I decided to migrate to the new API, and am seeing some problems that I can’t understand. I could roll back to my older version, but it feels like I’m almost there, and want to just push across the finish line. I just moved to Ray 2.42.1, and can’t find answers in any of the documentation.
Problem 1: Once the training loop begins, I get one call to algo.train(), it hangs for many seconds, then the worker dies with exit code 1. With this I am basically dead in the water. That worker’s error log in /tmp/ray shows the following:
ERROR actor_manager.py:187 -- Worker exception caught during `apply()`: Tensor.__contains__ only supports Tensor or scalar, but you passed in a <class 'str'>.
Traceback (most recent call last):
File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/rllib/utils/actor_manager.py", line 183, in apply
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/rllib/execution/rollout_ops.py", line 110, in <lambda>
else (lambda w: (w.sample(**random_action_kwargs), w.get_metrics()))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/util/tracing/tracing_helper.py", line 463, in _resume_span
return method(self, *_args, **_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/rllib/env/single_agent_env_runner.py", line 205, in sample
samples = self._sample(
^^^^^^^^^^^^^
File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/util/tracing/tracing_helper.py", line 463, in _resume_span
return method(self, *_args, **_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/rllib/env/single_agent_env_runner.py", line 304, in _sample
to_env = self._module_to_env(
^^^^^^^^^^^^^^^^^^^^
File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/rllib/connectors/connector_pipeline_v2.py", line 111, in __call__
batch = connector(
^^^^^^^^^^
File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/rllib/connectors/module_to_env/get_actions.py", line 62, in __call__
self._get_actions(batch, rl_module, explore)
File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/rllib/connectors/module_to_env/get_actions.py", line 68, in _get_actions
if Columns.ACTIONS in batch:
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/torch/_tensor.py", line 1180, in __contains__
raise RuntimeError(
RuntimeError: Tensor.__contains__ only supports Tensor or scalar, but you passed in a <class 'str'>.
Edit: SOLVED. After a tremendous amount of debugging on the RLlib source code, I discovered how make my _forward method generate the required ACTIONS, ACTION_DIST_INPUTS and ACTION_LOGP tensors that were never needed with the old API, and to revamp the generation and return of value function results that is totally different from the calls invoked in the old API. It is unfortunate that the migration docs don’t give better hints as to what is required in a non-trivial example.
Problem 2: As the training program begins I get this message:
WARNING rl_module.py:419 -- Could not create a Catalog object for your RLModule! If you are not using the new API stack yet, make sure to switch it off in your config: `config.api_stack(enable_rl_module_and_learner=False, enable_env_runner_and_connector_v2=False)`. All algos use the new stack by default. Ignore this message, if your RLModule does not use a Catalog to build its sub-components.
It seems to keep running, so it’s not clear whether there is a problem or not. But if it’s not a problem then the message should not be labeled “WARNING”. I cannot find anything that says if, or how, to use a model catalog in the new API.
Problem 3: Somewhere in the setup this message is generated. WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future!
Again, it seems benign for the time being, (I disagree with tagging a deprecation notice as “WARNING” - too severe.) But I have no idea what part of my code is generating it. The example it shows is not present anywhere in my code.
Problem 4: Where did all the training results go? My old code had full_results = algo.train()
and made extensive use of the large dict of info in these results. Now the return dict only has a handful of items, excluding keys important to me like “counters” and “info” (to get “cur_lr”, among others). If this info is no longer being returned by train(), where can I go to get it?
Edit: SOLVED. Since solving problem 1 above, all of the results I need are now being presented, just with some different dict structure, which is no big deal.
A trimmed-down copy of my training script is here.
def main(argv):
ray.init(storage = DATA_PATH)
cfg = PPOConfig()
cfg.framework("torch")
# Manage the Ray API migration path - force it to the new API stack (old stack is the default)
# Upgrade this at any time per https://docs.ray.io/en/master/rllib/new-api-stack-migration-guide.html
cfg.api_stack(enable_rl_module_and_learner = True, enable_env_runner_and_connector_v2 = True)
env_config = { }
cfg.environment(env = InvestEnv, env_config = env_config)
# This feels wrong, but it reflects the migration guide
cfg.rl_module(rl_module_spec = RLModuleSpec(module_class = MyCustomNN))
cfg.resources( num_cpus_for_main_process = 4)
cfg.env_runners(explore = True,
num_env_runners = 4,
num_cpus_per_env_runner = 4,
num_gpus_per_env_runner = 0.1,
num_envs_per_env_runner = 2,
rollout_fragment_length = 32,
)
cfg.learners( num_learners = 1,
num_gpus_per_learner = 0.5,
)
cfg.callbacks(CustomCallbacks)
cfg.training( gamma = 0.995,
train_batch_size_per_learner= 4e-5,
minibatch_size = 128,
entropy_coeff = 0.004,
kl_coeff = 0.5,
clip_param = 0.2,
grad_clip = 0.7,
grad_clip_by = "norm",
)
algo = cfg.build_algo()
# Run the training loop
for iter in range(1, max_iterations+1):
full_result = algo.train()
# Two of these data extractions no longer work
counters = full_result["counters"]
result = full_result["env_runners"]
actual_lr = full_result["info"]["learner"]["default_policy"]["learner_stats"]["cur_lr"]
# Capture reward stats as a set of moving averages - these are no longer available!
rmin = result["episode_return_min"]
rmean = result["episode_return_mean"]
rmax = result["episode_return_max"]
eplen = result["episode_len_mean"]
# Print status & analyze progress here...
algo.stop()
ray.shutdown()
if __name__ == "__main__":
main(sys.argv)
I would appreciate any clarity you could provide on any of these problems!
Thank you,
John