The problem of multiple model calls

bug404 · May 7, 2021, 3:06pm

I run the file, test_rllib_demo.py, I made some modification based on the official custom_fast_model.py, and I also made some modifications based on the TorchFastModel class in ray.rllib.examples.models.fast_model.py. I add some print() functions

When I run the test_rllib_demo.py, I found the output is

(ray) yan@DESKTOP-P7IV52N:~/deep-rl-with-robots/test$ python test_rllib_demo.py 
/home/yan/miniconda3/envs/ray/lib/python3.6/site-packages/ray/autoscaler/_private/cli_logger.py:61: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
  "update your install command.", FutureWarning)
2021-05-07 22:51:52,559 INFO services.py:1269 -- View the Ray dashboard at http://127.0.0.1:8265
2021-05-07 22:51:55,391 INFO trainer.py:696 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=4221) obs space:  Box(0.0, 1.0, (84, 84, 4), float32)
(pid=4221) action_space:  Discrete(2)
(pid=4221) num outpus:  2
(pid=4221) model config:  {'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 0, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': 'fast_model', 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}
(pid=4221) name:  default_model
obs space:  Box(0.0, 1.0, (84, 84, 4), float32)
action_space:  Discrete(2)
num outpus:  2
model config:  {'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 0, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': 'fast_model', 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}
name:  default_model
(pid=4221) input dict:  SampleBatch(['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'dones', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'obs_flat'])
(pid=4221) input dict obs shape:  torch.Size([32, 84, 84, 4])
(pid=4221) state:  []
(pid=4221) seq lens:  None
(pid=4221) input dict:  SampleBatch(['obs', 'seq_lens', 'obs_flat'])
(pid=4221) input dict obs shape:  torch.Size([1, 84, 84, 4])
(pid=4221) state:  []
(pid=4221) seq lens:  [1]
(pid=4221) input dict:  SampleBatch(['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'dones', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'vf_preds', 'action_dist_inputs', 'action_prob', 'action_logp', 'advantages', 'value_targets', 'obs_flat'])
(pid=4221) input dict obs shape:  torch.Size([32, 84, 84, 4])
(pid=4221) state:  []
(pid=4221) seq lens:  None
input dict:  SampleBatch(['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'dones', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'obs_flat'])
input dict obs shape:  torch.Size([32, 84, 84, 4])
state:  []
seq lens:  None
input dict:  SampleBatch(['obs', 'seq_lens', 'obs_flat'])
input dict obs shape:  torch.Size([1, 84, 84, 4])
state:  []
seq lens:  [1]
input dict:  SampleBatch(['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'dones', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'vf_preds', 'action_dist_inputs', 'action_prob', 'action_logp', 'advantages', 'value_targets', 'obs_flat'])
input dict obs shape:  torch.Size([32, 84, 84, 4])
state:  []
seq lens:  None
2021-05-07 22:51:57,934 WARNING util.py:53 -- Install gputil for GPU system monitoring.
TorchFastModel(
  (dummy_layer): SlimFC(
    (_model): Sequential(
      (0): Linear(in_features=1, out_features=1, bias=True)
    )
  )
)

My question is that why the same variable outputs multiple times? Like the obs shape: torch.Size([32, 84, 84, 4]), it outputs multiple times.

mannyv · May 7, 2021, 4:32pm

Hi @bug404,

See this comment for an explanation. Initialise loss from dummy batch method in policy.py - #2 by mannyv

bug404 · May 8, 2021, 1:11am

Yeah, I just want to know why the init process will call the init and forward functions multiple times. I’m just interested in the calling process behind it.

Topic		Replies	Views
Why is my `rllib.models.torch.torch_modelv2.TorchModelV2` receiving a Tensor of shape ( 32, <observation size> )? Configure Algorithm, Training, Evaluation, Scaling	1	725	November 15, 2022
How to give inputs to a model and get output of the model? RLlib	3	621	May 8, 2021
Custom RNN Model with Examples - why do they fail? RLlib	11	2352	May 5, 2022
RLLib Computing random actions that don't match model output RLlib	0	180	November 15, 2023
RLModule with autoregressive actions RLlib	1	187	December 26, 2023

The problem of multiple model calls

Related topics