How do i compute an action from a trained RLlib PPO policy with the new API?

jsftexas · November 10, 2025, 4:59pm

I’m trying to get a computed action from a trained PPO model, but all of the examples I’ve found are from the old API, not the new one.

What is the new API equivalent to this?

action = my_trained_policy.compute_single_action(obs)

RunLLM · November 10, 2025, 5:00pm

In the new RLlib API stack, you should use the RLModule’s forward_inference method instead of compute_single_action. Specifically, after loading your RLModule from a checkpoint, call rl_module.forward_inference({“obs”: torch_obs_batch}) to get action logits, then sample or select an action from the resulting distribution. The old compute_single_action API is deprecated and not available in the new stack. See the official example here and discussion in this issue.

Would you like a step-by-step code example for this process?

Sources:

Hint: Mention @RunLLM in the post for followups.

jsftexas · November 10, 2025, 6:06pm

A step-by-step code example would be great. I have a continuous action space. So I think I need to sample from the distribution. Will the sampled values correspond to “recommended” values that optimize the reward?

Topic		Replies	Views
How to compute actions with RLlib and Tune after training RLlib	6	722	July 15, 2025
Compute actions Programmatically RLlib	1	312	February 5, 2022
Policy.compute_single_action() wrong outputs RLlib	0	259	October 30, 2023
Next action in RLlib VisionNetworks RLlib	4	540	April 27, 2021
How to revise the PPO action compute? RLlib	0	294	February 16, 2022

How do i compute an action from a trained RLlib PPO policy with the new API?

Related topics