Confusion migrating to new API

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi. I have been using & enhancing a single-agent RL solution with custom NN and custom environment using PPO for several months, with lots of success using the legacy RLlib API on Ray 2.40. This week I decided to migrate to the new API, and am seeing some problems that I can’t understand. I could roll back to my older version, but it feels like I’m almost there, and want to just push across the finish line. I just moved to Ray 2.42.1, and can’t find answers in any of the documentation.

Problem 1: Once the training loop begins, I get one call to algo.train(), it hangs for many seconds, then the worker dies with exit code 1. With this I am basically dead in the water. That worker’s error log in /tmp/ray shows the following:

ERROR actor_manager.py:187 -- Worker exception caught during `apply()`: Tensor.__contains__ only supports Tensor or scalar, but you passed in a <class 'str'>.
Traceback (most recent call last):
  File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/rllib/utils/actor_manager.py", line 183, in apply
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/rllib/execution/rollout_ops.py", line 110, in <lambda>
    else (lambda w: (w.sample(**random_action_kwargs), w.get_metrics()))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/util/tracing/tracing_helper.py", line 463, in _resume_span
    return method(self, *_args, **_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/rllib/env/single_agent_env_runner.py", line 205, in sample
    samples = self._sample(
              ^^^^^^^^^^^^^
  File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/util/tracing/tracing_helper.py", line 463, in _resume_span
    return method(self, *_args, **_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/rllib/env/single_agent_env_runner.py", line 304, in _sample
    to_env = self._module_to_env(
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/rllib/connectors/connector_pipeline_v2.py", line 111, in __call__
    batch = connector(
            ^^^^^^^^^^
  File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/rllib/connectors/module_to_env/get_actions.py", line 62, in __call__
    self._get_actions(batch, rl_module, explore)
  File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/ray/rllib/connectors/module_to_env/get_actions.py", line 68, in _get_actions
    if Columns.ACTIONS in batch:
       ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/starkj/miniconda3/envs/trader3/lib/python3.12/site-packages/torch/_tensor.py", line 1180, in __contains__
    raise RuntimeError(
RuntimeError: Tensor.__contains__ only supports Tensor or scalar, but you passed in a <class 'str'>.

Edit: SOLVED. After a tremendous amount of debugging on the RLlib source code, I discovered how make my _forward method generate the required ACTIONS, ACTION_DIST_INPUTS and ACTION_LOGP tensors that were never needed with the old API, and to revamp the generation and return of value function results that is totally different from the calls invoked in the old API. It is unfortunate that the migration docs don’t give better hints as to what is required in a non-trivial example.

Problem 2: As the training program begins I get this message:
WARNING rl_module.py:419 -- Could not create a Catalog object for your RLModule! If you are not using the new API stack yet, make sure to switch it off in your config: `config.api_stack(enable_rl_module_and_learner=False, enable_env_runner_and_connector_v2=False)`. All algos use the new stack by default. Ignore this message, if your RLModule does not use a Catalog to build its sub-components.
It seems to keep running, so it’s not clear whether there is a problem or not. But if it’s not a problem then the message should not be labeled “WARNING”. I cannot find anything that says if, or how, to use a model catalog in the new API.

Problem 3: Somewhere in the setup this message is generated. WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future!
Again, it seems benign for the time being, (I disagree with tagging a deprecation notice as “WARNING” - too severe.) But I have no idea what part of my code is generating it. The example it shows is not present anywhere in my code.

Problem 4: Where did all the training results go? My old code had full_results = algo.train() and made extensive use of the large dict of info in these results. Now the return dict only has a handful of items, excluding keys important to me like “counters” and “info” (to get “cur_lr”, among others). If this info is no longer being returned by train(), where can I go to get it?
Edit: SOLVED. Since solving problem 1 above, all of the results I need are now being presented, just with some different dict structure, which is no big deal.

A trimmed-down copy of my training script is here.

def main(argv):
    ray.init(storage = DATA_PATH)
    cfg = PPOConfig()
    cfg.framework("torch")

    # Manage the Ray API migration path - force it to the new API stack (old stack is the default)
    # Upgrade this at any time per https://docs.ray.io/en/master/rllib/new-api-stack-migration-guide.html
    cfg.api_stack(enable_rl_module_and_learner = True, enable_env_runner_and_connector_v2 = True)

    env_config = { }
    cfg.environment(env = InvestEnv, env_config = env_config)

    # This feels wrong, but it reflects the migration guide
    cfg.rl_module(rl_module_spec = RLModuleSpec(module_class = MyCustomNN))

    cfg.resources(  num_cpus_for_main_process   = 4)
    cfg.env_runners(explore                     = True,
                    num_env_runners             = 4,
                    num_cpus_per_env_runner     = 4,
                    num_gpus_per_env_runner     = 0.1,
                    num_envs_per_env_runner     = 2,
                    rollout_fragment_length     = 32,
    )

    cfg.learners(   num_learners                = 1,
                    num_gpus_per_learner        = 0.5,
    )
    cfg.callbacks(CustomCallbacks)
    cfg.training(   gamma                       = 0.995,
                    train_batch_size_per_learner= 4e-5,
                    minibatch_size              = 128,
                    entropy_coeff               = 0.004,
                    kl_coeff                    = 0.5,
                    clip_param                  = 0.2,
                    grad_clip                   = 0.7,
                    grad_clip_by                = "norm",
    )

    algo = cfg.build_algo()

    # Run the training loop
    for iter in range(1, max_iterations+1):
        full_result = algo.train()

        # Two of these data extractions no longer work
        counters = full_result["counters"]
        result = full_result["env_runners"]
        actual_lr = full_result["info"]["learner"]["default_policy"]["learner_stats"]["cur_lr"]

        # Capture reward stats as a set of moving averages - these are no longer available!
        rmin = result["episode_return_min"]
        rmean = result["episode_return_mean"]
        rmax = result["episode_return_max"]
        eplen = result["episode_len_mean"]

        # Print status & analyze progress here...

    algo.stop()
    ray.shutdown()

if __name__ == "__main__":
   main(sys.argv)

I would appreciate any clarity you could provide on any of these problems!
Thank you,
John

I have resolved problems 1 & 4 myself. See embedded edits. Still looking for advice on problems 2 & 3, please.

Thank you for the update. I am also desperate with the new RL module. I try to use .rl_module but the Value error also refers to the RLModule. But I don’t know how to integrate it with the usual way to use an algorithm config. I can only use specs there?!?

https://docs.ray.io/en/latest/rllib/rllib-rlmodule.html

Can you be a little more specific about what you mean, “the value error also refers to RLModule”? If your NN model is derived from RLModule, then it needs a new method named compute_values(). This replaces the old value_function(), but is used a little differently. I used to have the forward (now _forward()) method compute an internal self._value and store it for the value_function() to simply return. But now it seems normal for RLlib to create a new model instance and immediately call the compute_values() without ever calling _forward() first, and in this call it provides a number of samples that is larger than the defined minibatch size. So I just have compute_values() do a direct call to _forward() then return its computed value variable.

I didn’t touched my custom model yet and tried to just run a standard model. Config looks like:

config = (
        PPOConfig()
        .environment(env="EnvPySC2")
        .framework("torch")
        .env_runners(
            # env_to_module_connector=lambda env: FlattenObservations(),
            sample_timeout_s=300,
        )

if I don’t use the env_to_module_connector with the FlattenObservation I run into two errors:

Anaconda3\envs\pysc2-ray-2-42\lib\site-packages\ray\rllib\algorithms\ppo\default_ppo_rl_module.py", line 31, in setup   
(SingleAgentEnvRunner pid=19884)     self.catalog.actor_critic_encoder_config.base_encoder_config,
(SingleAgentEnvRunner pid=19884) AttributeError: 'NoneType' object has no attribute 'actor_critic_encoder_config'

ValueError: `RLModule(config=[RLModuleConfig])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., 
learner_only=.., model_config=..)` instead.

However, with FlattenObservation and with/without the specific usage of the .rl_module in the AlgorithmConfig I get regularly the same warnings as you:

WARNING deprecation.py:50 -- DeprecationWarning: `RLModule(config=[RLModuleConfig object])` has been deprecated. Use `RLModule(observation_space=.., action_space=.., inference_only=.., model_config=.., catalog_class=..)` instead. This will raise an error in the future!

WARNING rl_module.py:419 -- Could not create a Catalog object for your RLModule! If you are not using the new API stack yet, make sure to switch it off in your config: `config.api_stack(enable_rl_module_and_learner=False, enable_env_runner_and_connector_v2=False)`. All algos use the new stack by default. Ignore this message, if your RLModule does not use a Catalog to build its sub-components.

Here I don’t really understand

  1. How I approach the deprecation warning? I thought to just give a spec to the config like this:
        .rl_module(
            rl_module_spec=RLModuleSpec(
                module_class=DefaultPPOTorchRLModule,
                model_config={
                    "head_fcnet_hiddens": [64, 64],
                    "head_fcnet_activation": "relu",
                },
                catalog_class=PPOCatalog,
            ),
        )

, which would be a construction through RLModuleSpecs RL Modules — Ray 2.42.1

but is the warning telling me here to instead make the construction through the class constructor? Then I don’t understand to integrate a class constructed RLModule with the AlgorithmConfig

  1. No idea what exactly is the catalog object and what it needs for creation.

Yes, exactly. I stand with you in your confusion! We seem to be thinking about it the same way. Sorry I am of no help at this point.

John Stark

1 Like