I run the file, test_rllib_demo.py, I made some modification based on the official custom_fast_model.py, and I also made some modifications based on the TorchFastModel class in ray.rllib.examples.models.fast_model.py. I add some print() functions
When I run the test_rllib_demo.py
, I found the output is
(ray) yan@DESKTOP-P7IV52N:~/deep-rl-with-robots/test$ python test_rllib_demo.py
/home/yan/miniconda3/envs/ray/lib/python3.6/site-packages/ray/autoscaler/_private/cli_logger.py:61: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
"update your install command.", FutureWarning)
2021-05-07 22:51:52,559 INFO services.py:1269 -- View the Ray dashboard at http://127.0.0.1:8265
2021-05-07 22:51:55,391 INFO trainer.py:696 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=4221) obs space: Box(0.0, 1.0, (84, 84, 4), float32)
(pid=4221) action_space: Discrete(2)
(pid=4221) num outpus: 2
(pid=4221) model config: {'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 0, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': 'fast_model', 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}
(pid=4221) name: default_model
obs space: Box(0.0, 1.0, (84, 84, 4), float32)
action_space: Discrete(2)
num outpus: 2
model config: {'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'num_framestacks': 0, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': 'fast_model', 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1, 'framestack': True}
name: default_model
(pid=4221) input dict: SampleBatch(['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'dones', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'obs_flat'])
(pid=4221) input dict obs shape: torch.Size([32, 84, 84, 4])
(pid=4221) state: []
(pid=4221) seq lens: None
(pid=4221) input dict: SampleBatch(['obs', 'seq_lens', 'obs_flat'])
(pid=4221) input dict obs shape: torch.Size([1, 84, 84, 4])
(pid=4221) state: []
(pid=4221) seq lens: [1]
(pid=4221) input dict: SampleBatch(['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'dones', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'vf_preds', 'action_dist_inputs', 'action_prob', 'action_logp', 'advantages', 'value_targets', 'obs_flat'])
(pid=4221) input dict obs shape: torch.Size([32, 84, 84, 4])
(pid=4221) state: []
(pid=4221) seq lens: None
input dict: SampleBatch(['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'dones', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'obs_flat'])
input dict obs shape: torch.Size([32, 84, 84, 4])
state: []
seq lens: None
input dict: SampleBatch(['obs', 'seq_lens', 'obs_flat'])
input dict obs shape: torch.Size([1, 84, 84, 4])
state: []
seq lens: [1]
input dict: SampleBatch(['obs', 'new_obs', 'actions', 'prev_actions', 'rewards', 'prev_rewards', 'dones', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'vf_preds', 'action_dist_inputs', 'action_prob', 'action_logp', 'advantages', 'value_targets', 'obs_flat'])
input dict obs shape: torch.Size([32, 84, 84, 4])
state: []
seq lens: None
2021-05-07 22:51:57,934 WARNING util.py:53 -- Install gputil for GPU system monitoring.
TorchFastModel(
(dummy_layer): SlimFC(
(_model): Sequential(
(0): Linear(in_features=1, out_features=1, bias=True)
)
)
)
My question is that why the same variable outputs multiple times? Like the obs shape: torch.Size([32, 84, 84, 4])
, it outputs multiple times.