How can I get the value of a state from a trained PPO model with the new API?

RunLLM · November 10, 2025, 7:43pm

To get the expected value (value function output) for a given state observation from a trained PPO model using the new RLlib API, you need to use the RLModule’s compute_values method. This requires access to the RLModule (e.g., via algo.get_module()), and you should pass the observation in the correct format (typically as a dictionary with key "obs" and a torch tensor value). The value function output is then returned as a tensor. For example:

import torch
obs = ...  # your observation as a numpy array
module = algo.get_module()
value = module.compute_values({"obs": torch.from_numpy(obs).unsqueeze(0)})

This approach is shown in train_w_bc_finetune_w_ppo.py and discussed in KeyError: ‘advantages’ on MARL.

Would you like a step-by-step breakdown or more details on handling batch/multi-agent cases?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Getting Critic Output for Given Observation RLlib	5	383	September 7, 2021
How do i compute an action from a trained RLlib PPO policy with the new API? RLlib	2	31	November 10, 2025
Value function of recurrent state models RLlib	6	629	October 7, 2021
KeyError: 'advantages' when training PPO with custom model in RLlib RLlib	10	278	November 7, 2025
KeyError: 'advantages' on MARL Configure Algorithm, Training, Evaluation, Scaling	4	113	April 17, 2025

How can I get the value of a state from a trained PPO model with the new API?

Related topics