Inference with a trained model

iddqd · January 16, 2026, 5:49pm

How can I use my trained model for inference ?

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.

2. Environment:

Ray version: 2.51.1
Python version: Python 3.12.12
OS: macOS 26.2 (25C56)
Cloud/Infrastructure: None

**3. What happened vs. what you expected:
**
Hi ! I am new to RLlib and I am trying to do inference with an already trained model saved in my checkpoint folder. I can load and evaluate my model and it has satisfying results. Nevertheless, I tried to do some simple inference steps using .forward_inference() method, and the behavior is random.

Important note : I have a custom catalog and connector. The connector is only here for data preprocessing, which I already did in my rollouts with np.concatenate(..)

Maybe someone could help me with this ?

Here is my code :

sac_config = config.to_rllib_config()
algo = sac_config.build_algo()

checkpoint_path = os.path.abspath(config.checkpoint_path)
algo.restore(checkpoint_path)


print(f"Successfully loaded checkpoint from: {checkpoint_path}")

module = algo.get_module()

env = algo.env_creator(config.environment.model_dump())


for _ in range(config.inference_steps):

  obs, _ = env.reset()
  done = False
  episode_return = 0.0

  while not done:

    processed_obs = np.concatenate([value for value in obs.values()])

    processed_obs = torch.tensor(processed_obs, dtype = torch.float32).unsqueeze(0)

    input_dict = {Columns.OBS: processed_obs}

    out = module.forward_inference(input_dict)
    logits = convert_to_torch_tensor(out[Columns.ACTION_DIST_INPUTS])
    actions = module.action_dist_cls.from_logits(logits).sample()
    actions = actions.detach().numpy().squeeze(0)

    obs, reward, terminated, truncated, _ = env.step(actions)
    episode_return += reward

    done = terminated or truncated

    env.render()

env.close()

algo.stop()

ray.shutdown()

RunLLM · January 16, 2026, 5:50pm

To use your trained RLlib model for inference, you should load the RLModule from the checkpoint and preprocess observations exactly as during training, including any connector logic (e.g., normalization, concatenation). If you skip the connector pipeline or preprocess differently, inference results may be inconsistent or suboptimal. RLlib recommends using the EnvToModulePipeline and ModuleToEnvPipeline from the checkpoint for preprocessing and postprocessing, especially if you used custom connectors or filters during training. Directly calling .forward_inference() on the module with manually preprocessed observations can lead to mismatches and random behavior, as you observed.

For correct inference, load the connector pipelines from the checkpoint and use them to process observations and actions, ensuring consistency with training. See the official policy_inference_after_training_w_connector.py example for a full workflow. If you only use .forward_inference() directly, make sure your preprocessing exactly matches what the connector did during training, or else use the connector pipeline itself.

Would you like a step-by-step code example for this process?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Question related to inference in RLlib RLlib	5	877	May 13, 2021
Building a inference endpoint from RLlib PPO checkpoint RLlib	1	11	January 21, 2026
How can I deploy my reinforcement learning model trained with tune using the new API? Checkpointing, Restoring	3	62	September 10, 2025
Policy rollout on Ray Tune 2.0 RLlib	4	342	December 15, 2022
Bad inference after perfect training. What am I missing? RLlib	3	777	June 8, 2022

Inference with a trained model

Related topics