How to print the TF model?

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Excuse the probable dumbicity of my question, but I cant figure out how to print or visualize my model like this:

> policy.model.base_model.summary()
> Model: "model"
> _____________________________________________________________________
> Layer (type)               Output Shape  Param #  Connected to
> =====================================================================
> observations (InputLayer)  [(None, 4)]   0
> _____________________________________________________________________
> fc_1 (Dense)               (None, 256)   1280     observations[0][0]
> _____________________________________________________________________
> fc_value_1 (Dense)         (None, 256)   1280     observations[0][0]
> _____________________________________________________________________
> fc_2 (Dense)               (None, 256)   65792    fc_1[0][0]
> _____________________________________________________________________
> fc_value_2 (Dense)         (None, 256)   65792    fc_value_1[0][0]
> _____________________________________________________________________
> fc_out (Dense)             (None, 2)     514      fc_2[0][0]
> _____________________________________________________________________
> value_out (Dense)          (None, 1)     257      fc_value_2[0][0]
> =====================================================================

This is my code:

config = (  
    PPOConfig()
    .resources(num_gpus=1, num_cpus_per_worker=1, num_gpus_per_worker=0.2)
    .environment("SimpleEnv",disable_env_checking=True)
    .rollouts( num_rollout_workers=1, num_envs_per_worker=1,  rollout_fragment_length="auto", batch_mode="complete_episodes",preprocessor_pref=None,observation_filter="NoFilter",compress_observations=False) 
    .framework(framework="tf2", eager_tracing=False)
    .experimental( _disable_preprocessor_api=True)
)
algo = config.build()

But then
print(algo.get_policy().model)
Prints:
<ray.rllib.models.tf.complex_input_net.ComplexInputNetwork object at 0x000001A60B9062F0>

And
algo.get_policy().model.base_model.summary()
Spits error:
AttributeError: ‘ComplexInputNetwork’ object has no attribute ‘base_model’

@PREJAN , as you can see in the source code of the ComplexInputNetwork this network possesses possibly mutliple models itself. As you do not post the action/obs spaces we cannot make precise answers here, but you might find your model under algo.get_policy().model.logits_and_value_model

1 Like

Thank you @Lars_Simon_Zehnder
algo.get_policy().model.logits_and_value_model.summary()
works perfect and prints the model.

For refference,

These are my obs/action spaces:
Box(0.0, 1.0, (12, 32), float32), Box(-1.0, 1.0, (3,), float32)

And this is the output of the model summary

__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to
==================================================================================================
 input_1 (InputLayer)           [(None, 256)]        0           []

 logits (Dense)                 (None, 6)            1542        ['input_1[0][0]']

 value_out (Dense)              (None, 1)            257         ['input_1[0][0]']

==================================================================================================
Total params: 1,799
Trainable params: 1,799
Non-trainable params: 0

it strikes me that the input layer has 256 dimensions (not like observation space’s 32*12=384) and value_out has one dimension (not like action space’s 3), but I’m still at an early learning stage and there are many concepts I don’t grasp yet. Could it be that this is printing just part of the model?

Thanks for your support Lars

2 Likes

@PREJAN , I am happy that you found a way to print your model to understand better the architecture of it.

As your obs space has two dimensions the ComplexInputNetwork is chosen automatically.

To your understanding of the inputs and outputs. Take a look at the Input layer of the ComplexInputNetwork; what you see is that this input comes not directly from the observation space, but from the post_fc_stack, so it is a pre-processed embedding. The embedding size can be defined in your config by choosing config["model"]["post_fcnet_hiddens"] (the default is 256).
To the output, the model outputs by two branches, namely value_out and logits. The former is the estimate for the value function and should be of dimension 1. The latter is the action output and should be in case of a continuous action space 2 x action_space.shape, so here 6. Why is this? What do you think?

2 Likes

Hi @PREJAN,

To see all the layers you would want to print these three variables in this order.

[cnns, post_fc_stack,logits_and_value_model]

Some of them might be empty /None depending on your settings.

2 Likes

I won’t delay more my answer to not be rude, but I’ve researched and tried to find the answer to the question. But I’m still not quite there… my best guess now would be ¿mean and distribution for each action space dimension? Still unclear to me, but the research is helping me understand a bit better PPO so thank you for that. For now I understand it has the actor critic networks, where the critic outputs one single number representing the value of the state we ended at (as you explained on your comment), as a way to criticise the action proposed by the actor network. And I understand the output of the actor network are probabilities of actions (thus… mean and distribution) and that’s where I’m still a bit lost, but I’ll get there eventually I hope :smiley:

I’ll also try to turn my observations space into a 1D array to get a simpler model, as it does not represent an image thus I guess cnns are not relevant and that would simplify my learning path with a not so “complex” Input network

Thanks @Lars_Simon_Zehnder , it was very helpful and instructive

1 Like

@mannyv hanks also for that tip, also very clarifying, I guess it’s called a complex input network because it’s composed first of some cnn layers to deal with the 2D images, then the fully connected layers to do brain magic, and finally the logits and values to act as critic actor expected output / I hope this is close enougth to not sound stupid : )