How to give inputs to a model and get output of the model?

bug404 · May 8, 2021, 6:45am

I run the code below

"""Example of using a custom image env and model.

Both the model and env are trivial (and super-fast), so they are useful
for running perf microbenchmarks.
"""

import argparse
import os

import ray
import ray.tune as tune
from ray.tune import sample_from
from fast_image_env import FastImageEnv
from fast_model import TorchFastModel,TorchCustomFastModel
from ray.rllib.models import ModelCatalog
from ray.rllib.agents.ppo import PPOTrainer

if __name__ == "__main__":
   
    ray.shutdown()
    ray.init()

    config = {
        "env": FastImageEnv,
        # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
        "num_gpus": 0,
        "num_workers": 1,
        "framework": "torch",
    }
    
    trainer=PPOTrainer(config=config)
    print(trainer.get_policy().model)
    
    ray.shutdown()

And the code gets the model summary as below

VisionNetwork(
  (_logits): SlimConv2d(
    (_model): Sequential(
      (0): ZeroPad2d(padding=(0, 0, 0, 0), value=0.0)
      (1): Conv2d(256, 2, kernel_size=[1, 1], stride=(1, 1))
    )
  )
  (_convs): Sequential(
    (0): SlimConv2d(
      (_model): Sequential(
        (0): ZeroPad2d(padding=(2, 2, 2, 2), value=0.0)
        (1): Conv2d(4, 16, kernel_size=[8, 8], stride=(4, 4))
        (2): ReLU()
      )
    )
    (1): SlimConv2d(
      (_model): Sequential(
        (0): ZeroPad2d(padding=(1, 2, 1, 2), value=0.0)
        (1): Conv2d(16, 32, kernel_size=[4, 4], stride=(2, 2))
        (2): ReLU()
      )
    )
    (2): SlimConv2d(
      (_model): Sequential(
        (0): Conv2d(32, 256, kernel_size=[11, 11], stride=(1, 1))
        (1): ReLU()
      )
    )
  )
  (_value_branch_separate): Sequential(
    (0): SlimConv2d(
      (_model): Sequential(
        (0): ZeroPad2d(padding=(2, 2, 2, 2), value=0.0)
        (1): Conv2d(4, 16, kernel_size=[8, 8], stride=(4, 4))
        (2): ReLU()
      )
    )
    (1): SlimConv2d(
      (_model): Sequential(
        (0): ZeroPad2d(padding=(1, 2, 1, 2), value=0.0)
        (1): Conv2d(16, 32, kernel_size=[4, 4], stride=(2, 2))
        (2): ReLU()
      )
    )
    (2): SlimConv2d(
      (_model): Sequential(
        (0): Conv2d(32, 256, kernel_size=[11, 11], stride=(1, 1))
        (1): ReLU()
      )
    )
    (3): SlimConv2d(
      (_model): Sequential(
        (0): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1))
      )
    )

I want to know how to give the env’s obs to the model and get the action and value outputs of the model?

arturn · May 8, 2021, 9:07am

Hi bug404,
To “give the env’s observations to the models get the action and value outputs of the model” is generally something that is done by RLlib behind the curtain. It is a task that is at the heart of every RL framework and thus should not be implemented by every user again and again.

You run an experiment, which uses rollout workers, which apply policies, which use your model. Have a look at this 60s document for a better explanation.
The details and the code related to this are, imho, fairly complex. Do you want to dig into that?

mannyv · May 8, 2021, 11:17am

Hi @bug404,

You can feed observations directly into the policy like this:

import argparse
import os

import ray
import ray.tune as tune
from ray.tune import sample_from
#from fast_image_env import FastImageEnv
#from fast_model import TorchFastModel,TorchCustomFastModel
from ray.rllib.models import ModelCatalog
from ray.rllib.agents.ppo import PPOTrainer

if __name__ == "__main__":

    ray.shutdown()
    ray.init()

    config = {
        "env": "CartPole-v0",
#        "env": FastImageEnv,
        # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
        "num_gpus": 0,
        "num_workers": 1,
        "framework": "torch",
    }

    trainer=PPOTrainer(config=config)
    print(trainer.get_policy().model)
    import numpy as np
    print(trainer.get_policy().compute_actions(np.random.random((10,4))))

This example returns the following:

(array([0, 0, 1, 1, 1, 1, 0, 0, 0, 0]), [], {'vf_preds': array([0.00250419, 0.00358895, 0.00189365, 0.00273949, 0.00215711,
       0.00187746, 0.00327462, 0.00141259, 0.00436519, 0.00338652],
      dtype=float32), 'action_dist_inputs': array([[-0.0058025 , -0.00861614],
       [-0.00517484, -0.01290101],
       [-0.00447717, -0.00948256],
       [-0.00315521, -0.00787028],
       [-0.00374755, -0.00866612],
       [-0.00446876, -0.01163902],
       [-0.00419985, -0.00884008],
       [-0.00214083, -0.00698731],
       [-0.00272015, -0.00557604],
       [-0.0058585 , -0.00616745]], dtype=float32), 'action_prob': array([0.50070345, 0.50193155, 0.49874866, 0.4988213 , 0.4987704 ,
       0.49820745, 0.5011601 , 0.50121164, 0.500714  , 0.50007725],
      dtype=float32), 'action_logp': array([-0.6917413 , -0.68929154, -0.695653  , -0.6955074 , -0.69560945,
       -0.6967387 , -0.69082975, -0.6907269 , -0.69172025, -0.6929927 ],
      dtype=float32)})

You can find the return value format here: RLlib Package Reference — Ray v2.0.0.dev0

bug404 · May 8, 2021, 12:02pm

Wow, it’s very useful. Thank you very much.

Topic		Replies	Views
Question about how to use custom models RLlib	3	1007	September 28, 2021
RLLIB Custom_keras_model RLlib	1	712	July 24, 2022
How to train custom models with `SampleBatch.INFOS` RLlib	7	679	February 11, 2022
The problem of multiple model calls RLlib	2	565	May 8, 2021
Any usage case of Using RLModule for 2D input with CNN as encoder? RLlib	0	271	December 1, 2023

How to give inputs to a model and get output of the model?

Related topics