The problem of multiple model calls

I run the file, test_rllib_demo.py, I made some modification based on the official custom_fast_model.py, and I also made some modifications based on the TorchFastModel class in ray.rllib.examples.models.fast_model.py. I add some print() functions



When I run the test_rllib_demo.py, I found the output is

(ray) yan@DESKTOP-P7IV52N:~/deep-rl-with-robots/test$ python test_rllib_demo.py 
/home/yan/miniconda3/envs/ray/lib/python3.6/site-packages/ray/autoscaler/_private/cli_logger.py:61: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
  "update your install command.", FutureWarning)
2021-05-07 22:51:52,559 INFO services.py:1269 -- View the Ray dashboard at http://127.0.0.1:8265
2021-05-07 22:51:55,391 INFO trainer.py:696 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=4221) obs space:  Box(0.0, 1.0, (84, 84, 4), float32)
(pid=4221) action_space:  Discrete(2)
(pid=4221) num outpus:  2
(pid=4221) model config:  {'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 0, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': 'fast_model', 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}
(pid=4221) name:  default_model
obs space:  Box(0.0, 1.0, (84, 84, 4), float32)
action_space:  Discrete(2)
num outpus:  2
model config:  {'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 0, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': 'fast_model', 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}
name:  default_model
(pid=4221) input dict:  SampleBatch(['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'dones', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'obs_flat'])
(pid=4221) input dict obs shape:  torch.Size([32, 84, 84, 4])
(pid=4221) state:  []
(pid=4221) seq lens:  None
(pid=4221) input dict:  SampleBatch(['obs', 'seq_lens', 'obs_flat'])
(pid=4221) input dict obs shape:  torch.Size([1, 84, 84, 4])
(pid=4221) state:  []
(pid=4221) seq lens:  [1]
(pid=4221) input dict:  SampleBatch(['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'dones', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'vf_preds', 'action_dist_inputs', 'action_prob', 'action_logp', 'advantages', 'value_targets', 'obs_flat'])
(pid=4221) input dict obs shape:  torch.Size([32, 84, 84, 4])
(pid=4221) state:  []
(pid=4221) seq lens:  None
input dict:  SampleBatch(['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'dones', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'obs_flat'])
input dict obs shape:  torch.Size([32, 84, 84, 4])
state:  []
seq lens:  None
input dict:  SampleBatch(['obs', 'seq_lens', 'obs_flat'])
input dict obs shape:  torch.Size([1, 84, 84, 4])
state:  []
seq lens:  [1]
input dict:  SampleBatch(['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'dones', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'vf_preds', 'action_dist_inputs', 'action_prob', 'action_logp', 'advantages', 'value_targets', 'obs_flat'])
input dict obs shape:  torch.Size([32, 84, 84, 4])
state:  []
seq lens:  None
2021-05-07 22:51:57,934 WARNING util.py:53 -- Install gputil for GPU system monitoring.
TorchFastModel(
  (dummy_layer): SlimFC(
    (_model): Sequential(
      (0): Linear(in_features=1, out_features=1, bias=True)
    )
  )
)

My question is that why the same variable outputs multiple times? Like the obs shape: torch.Size([32, 84, 84, 4]), it outputs multiple times.

Hi @bug404,

See this comment for an explanation. Initialise loss from dummy batch method in policy.py - #2 by mannyv

2 Likes

Yeah, I just want to know why the init process will call the init and forward functions multiple times. Iā€™m just interested in the calling process behind it.