Inference with a trained model

How can I use my trained model for inference ?

1. Severity of the issue: (select one)
:white_check_mark: None: I’m just curious or want clarification.

2. Environment:

  • Ray version: 2.51.1
  • Python version: Python 3.12.12
  • OS: macOS 26.2 (25C56)
  • Cloud/Infrastructure: None

**3. What happened vs. what you expected:
**
Hi ! I am new to RLlib and I am trying to do inference with an already trained model saved in my checkpoint folder. I can load and evaluate my model and it has satisfying results. Nevertheless, I tried to do some simple inference steps using .forward_inference() method, and the behavior is random.

Important note : I have a custom catalog and connector. The connector is only here for data preprocessing, which I already did in my rollouts with np.concatenate(..)

Maybe someone could help me with this ?

Here is my code :

sac_config = config.to_rllib_config()
algo = sac_config.build_algo()

checkpoint_path = os.path.abspath(config.checkpoint_path)
algo.restore(checkpoint_path)


print(f"Successfully loaded checkpoint from: {checkpoint_path}")

module = algo.get_module()

env = algo.env_creator(config.environment.model_dump())


for _ in range(config.inference_steps):

  obs, _ = env.reset()
  done = False
  episode_return = 0.0

  while not done:

    processed_obs = np.concatenate([value for value in obs.values()])

    processed_obs = torch.tensor(processed_obs, dtype = torch.float32).unsqueeze(0)

    input_dict = {Columns.OBS: processed_obs}

    out = module.forward_inference(input_dict)
    logits = convert_to_torch_tensor(out[Columns.ACTION_DIST_INPUTS])
    actions = module.action_dist_cls.from_logits(logits).sample()
    actions = actions.detach().numpy().squeeze(0)

    obs, reward, terminated, truncated, _ = env.step(actions)
    episode_return += reward

    done = terminated or truncated

    env.render()

env.close()

algo.stop()

ray.shutdown()

To use your trained RLlib model for inference, you should load the RLModule from the checkpoint and preprocess observations exactly as during training, including any connector logic (e.g., normalization, concatenation). If you skip the connector pipeline or preprocess differently, inference results may be inconsistent or suboptimal. RLlib recommends using the EnvToModulePipeline and ModuleToEnvPipeline from the checkpoint for preprocessing and postprocessing, especially if you used custom connectors or filters during training. Directly calling .forward_inference() on the module with manually preprocessed observations can lead to mismatches and random behavior, as you observed.

For correct inference, load the connector pipelines from the checkpoint and use them to process observations and actions, ensuring consistency with training. See the official policy_inference_after_training_w_connector.py example for a full workflow. If you only use .forward_inference() directly, make sure your preprocessing exactly matches what the connector did during training, or else use the connector pipeline itself.

Would you like a step-by-step code example for this process?

Sources:

Hint: Mention @RunLLM in the post for followups.

1 Like