Actor-Critic Net Structure

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity

Given an instantiated algorithm ( via, for example, algo =, where config specifies PPOConfig(), an environment, and the torch framework among other things), is it possible to dig down through also to see what the actor-critic model actual consists of, i.e., the layers of the neural network(s)?

I’m curious because Ray, which I’ve used for other things in distributed computing, imposes a lot of overhead and it seems difficult to get the specific data out that I want to see. So I was thinking of developing a PPO system using the actor-critic model that Ray generates for my custom environment.

@jjgriffin2 This is possible. I would choose a debugging approach as you can apply instantly a summary on the model and see the layers it consists of. Which version of Ray is installed? Are you using the old or new stack (AlgorithmConfig.experimental(_enable_new_api_stack) or in older versions AlgorithmConfig.rl_module(_enable_rl_module=True)).
You can find the code for the models in the respective catalogs (rllib/models/ or new stack rllib/core/models/

RLlib is made for RL production workloads and comes with some overhead when e.g. prototyping. We put a lot of efforts into moving from our old stack to a new one which is significantly faster and takes away a couple of software layers. Transferring to the new stack should be completed around Ray Summit 2023.

Interested in your Ray experiences: where exactly do you see much overhead?

The Ray version is nightly download, using the new stack. I’ve seen the code.

What I’m really referring to is the difficulty in getting specific data returned from the environment in the info field after a step (classically, state, raw, done, term, info = env.step(action)) and the action taken and storing that somewhere outside. I’ve set custom callbacks, but that just slows everything to a standstill. The environment is pretty complex, but I’m not doing anything so esoteric that it requires the full power of Ray. So, a simpler approach seems warranted, thus my interest in the model that Ray produces, applied to a non-Ray approach where I’m attempting to hook into a fairly sealed system.

BTW, you mention a “debugging approach.” What does that mean?