I have a trained PPO model and now I want to give it a state observation and return the expected value of the state. I’m struggling to find how to do this with the new RLlib API.
1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.
To get the expected value (value function output) for a given state observation from a trained PPO model using the new RLlib API, you need to use the RLModule’s compute_values method. This requires access to the RLModule (e.g., via algo.get_module()), and you should pass the observation in the correct format (typically as a dictionary with key "obs" and a torch tensor value). The value function output is then returned as a tensor. For example:
import torch
obs = ... # your observation as a numpy array
module = algo.get_module()
value = module.compute_values({"obs": torch.from_numpy(obs).unsqueeze(0)})