Actor-Critic Net Structure

jjgriffin2 · March 20, 2024, 1:00am

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

Given an instantiated algorithm ( via, for example, algo = config.build(), where config specifies PPOConfig(), an environment, and the torch framework among other things), is it possible to dig down through also to see what the actor-critic model actual consists of, i.e., the layers of the neural network(s)?

I’m curious because Ray, which I’ve used for other things in distributed computing, imposes a lot of overhead and it seems difficult to get the specific data out that I want to see. So I was thinking of developing a PPO system using the actor-critic model that Ray generates for my custom environment.

Lars_Simon_Zehnder · March 21, 2024, 9:39am

@jjgriffin2 This is possible. I would choose a debugging approach as you can apply instantly a summary on the model and see the layers it consists of. Which version of Ray is installed? Are you using the old or new stack (AlgorithmConfig.experimental(_enable_new_api_stack) or in older versions AlgorithmConfig.rl_module(_enable_rl_module=True)).
You can find the code for the models in the respective catalogs (rllib/models/catalog.py or new stack rllib/core/models/catalog.py).

RLlib is made for RL production workloads and comes with some overhead when e.g. prototyping. We put a lot of efforts into moving from our old stack to a new one which is significantly faster and takes away a couple of software layers. Transferring to the new stack should be completed around Ray Summit 2023.

Interested in your Ray experiences: where exactly do you see much overhead?

jjgriffin2 · March 21, 2024, 1:36pm

The Ray version is nightly download, using the new stack. I’ve seen the code.

What I’m really referring to is the difficulty in getting specific data returned from the environment in the info field after a step (classically, state, raw, done, term, info = env.step(action)) and the action taken and storing that somewhere outside. I’ve set custom callbacks, but that just slows everything to a standstill. The environment is pretty complex, but I’m not doing anything so esoteric that it requires the full power of Ray. So, a simpler approach seems warranted, thus my interest in the model that Ray produces, applied to a non-Ray approach where I’m attempting to hook into a fairly sealed system.

jjgriffin2 · March 21, 2024, 1:38pm

BTW, you mention a “debugging approach.” What does that mean?

Topic		Replies	Views
Actor/Critic model settings for PPO RLlib	1	574	November 30, 2022
Custom Critic (Value_function) in PPO RLlib	3	1004	March 11, 2021
PPO algorithm with Custom Environment Configure Algorithm, Training, Evaluation, Scaling	5	277	February 13, 2025
What happens when you pass a custom model to an actor-critic method RLlib	1	285	March 16, 2022
I cant get my custom network to work RLlib	7	108	April 11, 2025

Actor-Critic Net Structure

Related topics