How to give inputs to a model and get output of the model?

I run the code below

"""Example of using a custom image env and model.

Both the model and env are trivial (and super-fast), so they are useful
for running perf microbenchmarks.
"""

import argparse
import os

import ray
import ray.tune as tune
from ray.tune import sample_from
from fast_image_env import FastImageEnv
from fast_model import TorchFastModel,TorchCustomFastModel
from ray.rllib.models import ModelCatalog
from ray.rllib.agents.ppo import PPOTrainer

if __name__ == "__main__":
   
    ray.shutdown()
    ray.init()

    config = {
        "env": FastImageEnv,
        # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
        "num_gpus": 0,
        "num_workers": 1,
        "framework": "torch",
    }
    
    trainer=PPOTrainer(config=config)
    print(trainer.get_policy().model)
    
    ray.shutdown()

And the code gets the model summary as below

VisionNetwork(
  (_logits): SlimConv2d(
    (_model): Sequential(
      (0): ZeroPad2d(padding=(0, 0, 0, 0), value=0.0)
      (1): Conv2d(256, 2, kernel_size=[1, 1], stride=(1, 1))
    )
  )
  (_convs): Sequential(
    (0): SlimConv2d(
      (_model): Sequential(
        (0): ZeroPad2d(padding=(2, 2, 2, 2), value=0.0)
        (1): Conv2d(4, 16, kernel_size=[8, 8], stride=(4, 4))
        (2): ReLU()
      )
    )
    (1): SlimConv2d(
      (_model): Sequential(
        (0): ZeroPad2d(padding=(1, 2, 1, 2), value=0.0)
        (1): Conv2d(16, 32, kernel_size=[4, 4], stride=(2, 2))
        (2): ReLU()
      )
    )
    (2): SlimConv2d(
      (_model): Sequential(
        (0): Conv2d(32, 256, kernel_size=[11, 11], stride=(1, 1))
        (1): ReLU()
      )
    )
  )
  (_value_branch_separate): Sequential(
    (0): SlimConv2d(
      (_model): Sequential(
        (0): ZeroPad2d(padding=(2, 2, 2, 2), value=0.0)
        (1): Conv2d(4, 16, kernel_size=[8, 8], stride=(4, 4))
        (2): ReLU()
      )
    )
    (1): SlimConv2d(
      (_model): Sequential(
        (0): ZeroPad2d(padding=(1, 2, 1, 2), value=0.0)
        (1): Conv2d(16, 32, kernel_size=[4, 4], stride=(2, 2))
        (2): ReLU()
      )
    )
    (2): SlimConv2d(
      (_model): Sequential(
        (0): Conv2d(32, 256, kernel_size=[11, 11], stride=(1, 1))
        (1): ReLU()
      )
    )
    (3): SlimConv2d(
      (_model): Sequential(
        (0): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1))
      )
    )

I want to know how to give the env’s obs to the model and get the action and value outputs of the model?

Hi bug404,
To “give the env’s observations to the models get the action and value outputs of the model” is generally something that is done by RLlib behind the curtain. It is a task that is at the heart of every RL framework and thus should not be implemented by every user again and again.

You run an experiment, which uses rollout workers, which apply policies, which use your model. Have a look at this 60s document for a better explanation.
The details and the code related to this are, imho, fairly complex. Do you want to dig into that?

2 Likes

Hi @bug404,

You can feed observations directly into the policy like this:

import argparse
import os

import ray
import ray.tune as tune
from ray.tune import sample_from
#from fast_image_env import FastImageEnv
#from fast_model import TorchFastModel,TorchCustomFastModel
from ray.rllib.models import ModelCatalog
from ray.rllib.agents.ppo import PPOTrainer

if __name__ == "__main__":

    ray.shutdown()
    ray.init()

    config = {
        "env": "CartPole-v0",
#        "env": FastImageEnv,
        # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
        "num_gpus": 0,
        "num_workers": 1,
        "framework": "torch",
    }

    trainer=PPOTrainer(config=config)
    print(trainer.get_policy().model)
    import numpy as np
    print(trainer.get_policy().compute_actions(np.random.random((10,4))))

This example returns the following:

(array([0, 0, 1, 1, 1, 1, 0, 0, 0, 0]), [], {'vf_preds': array([0.00250419, 0.00358895, 0.00189365, 0.00273949, 0.00215711,
       0.00187746, 0.00327462, 0.00141259, 0.00436519, 0.00338652],
      dtype=float32), 'action_dist_inputs': array([[-0.0058025 , -0.00861614],
       [-0.00517484, -0.01290101],
       [-0.00447717, -0.00948256],
       [-0.00315521, -0.00787028],
       [-0.00374755, -0.00866612],
       [-0.00446876, -0.01163902],
       [-0.00419985, -0.00884008],
       [-0.00214083, -0.00698731],
       [-0.00272015, -0.00557604],
       [-0.0058585 , -0.00616745]], dtype=float32), 'action_prob': array([0.50070345, 0.50193155, 0.49874866, 0.4988213 , 0.4987704 ,
       0.49820745, 0.5011601 , 0.50121164, 0.500714  , 0.50007725],
      dtype=float32), 'action_logp': array([-0.6917413 , -0.68929154, -0.695653  , -0.6955074 , -0.69560945,
       -0.6967387 , -0.69082975, -0.6907269 , -0.69172025, -0.6929927 ],
      dtype=float32)})

You can find the return value format here: RLlib Package Reference — Ray v2.0.0.dev0

2 Likes

Wow, it’s very useful. Thank you very much.

2 Likes