What is the default PPO network architecture?

Hi folks,

I’d appreciate if you could help me get the information I need here. I thought I’d find it in the documentation but apparently not.

What is the default network architecture for RLlib’s PPO implementation? I’m using version 2.6.1 if that matters.

Thanks for your help,
Ram Rachum.

Hola, estoy usando TUNE y al momento de realizar el entrenamiento que da la siguiente informacion que da detalles PPO y el entorno:
Trial PPO_CacheEnv_17200_00000 started with configuration:
╭───────────────────────────────────────────────────────────────────────────╮
│ Trial PPO_CacheEnv_17200_00000 config │
├───────────────────────────────────────────────────────────────────────────┤
│ _AlgorithmConfig__prior_exploration_config │
│ _disable_action_flattening False │
│ _disable_execution_plan_api True │
│ _disable_initialize_loss_from_dummy_batch False │
│ _disable_preprocessor_api False │
│ _enable_new_api_stack False │
│ _fake_gpus False │
│ _is_atari │
│ _learner_class │
│ _rl_module_spec │
│ _tf_policy_handles_more_than_one_loss False │
│ action_mask_key action_mask │
│ action_space │
│ actions_in_input_normalized False │
│ always_attach_evaluation_results False │
│ auto_wrap_old_gym_envs True │
│ batch_mode truncate_episodes │
│ callbacks …efaultCallbacks’> │
│ checkpoint_trainable_policies_only False │
│ clip_actions False │
│ clip_param 0.3 │
│ clip_rewards │
│ compress_observations False │
│ count_steps_by env_steps │
│ create_env_on_driver False │
│ custom_eval_function │
│ delay_between_worker_restarts_s 60. │
│ disable_env_checking False │
│ eager_max_retraces 20 │
│ eager_tracing True │
│ enable_async_evaluation False │
│ enable_connectors True │
│ enable_tf1_exec_eagerly False │
│ entropy_coeff 0. │
│ entropy_coeff_schedule │
│ env …e_env4.CacheEnv’> │
│ env_config/C 10 │
│ env_config/cache_size 100 │
│ env_config/disable_env_checking True │
│ env_config/k1 7 │
│ env_config/k2 3 │
│ env_config/source_file …ipf/(20000,4).csv │
│ env_runner_cls │
│ env_task_fn │
│ evaluation_config │
│ evaluation_duration 10 │
│ evaluation_duration_unit episodes │
│ evaluation_interval │
│ evaluation_num_workers 0 │
│ evaluation_parallel_to_training False │
│ evaluation_sample_timeout_s 180. │
│ exploration_config/type StochasticSampling │
│ explore True │
│ export_native_model_files False │
│ fake_sampler False │
│ framework torch │
│ gamma 0.99 │
│ grad_clip │
│ grad_clip_by global_norm │
│ ignore_worker_failures False │
│ in_evaluation False │
│ input sampler │
│ keep_per_episode_custom_metrics False │
│ kl_coeff 0.2 │
│ kl_target 0.01 │
│ lambda 1. │
│ local_gpu_idx 0 │
│ local_tf_session_args/inter_op_parallelism_threads 8 │
│ local_tf_session_args/intra_op_parallelism_threads 8 │
│ log_level WARN │
│ log_sys_usage True │
│ logger_config │
│ logger_creator │
│ lr 0.01 │
│ lr_schedule │
│ max_num_worker_restarts 1000 │
│ max_requests_in_flight_per_sampler_worker 2 │
│ metrics_episode_collection_timeout_s 60. │
│ metrics_num_episodes_for_smoothing 100 │
│ min_sample_timesteps_per_iteration 0 │
│ min_time_s_per_iteration │
│ min_train_timesteps_per_iteration 0 │
│ model/_disable_action_flattening False │
│ model/_disable_preprocessor_api False │
│ model/_time_major False │
│ model/_use_default_native_models -1 │
│ model/always_check_shapes False │
│ model/attention_dim 64 │
│ model/attention_head_dim 32 │
│ model/attention_init_gru_gate_bias 2.0 │
│ model/attention_memory_inference 50 │
│ model/attention_memory_training 50 │
│ model/attention_num_heads 1 │
│ model/attention_num_transformer_units 1 │
│ model/attention_position_wise_mlp_dim 32 │
│ model/attention_use_n_prev_actions 0 │
│ model/attention_use_n_prev_rewards 0 │
│ model/conv_activation relu │
│ model/conv_filters │
│ model/custom_action_dist │
│ model/custom_model │
│ model/custom_preprocessor │
│ model/dim 84 │
│ model/encoder_latent_dim │
│ model/fcnet_activation tanh │
│ model/fcnet_hiddens [256, 256] │
│ model/framestack True │
│ model/free_log_std False │
│ model/grayscale False │
│ model/lstm_cell_size 256 │
│ model/lstm_use_prev_action False │
│ model/lstm_use_prev_action_reward -1 │
│ model/lstm_use_prev_reward False │
│ model/max_seq_len 20 │
│ model/no_final_linear False │
│ model/post_fcnet_activation relu │
│ model/post_fcnet_hiddens
│ model/use_attention False │
│ model/use_lstm False │
│ model/vf_share_layers False │
│ model/zero_mean True │
│ normalize_actions True │
│ num_consecutive_worker_failures_tolerance 100 │
│ num_cpus_for_driver 2 │
│ num_cpus_per_learner_worker 1 │
│ num_cpus_per_worker 1 │
│ num_envs_per_worker 1 │
│ num_gpus 1 │
│ num_gpus_per_learner_worker 0 │
│ num_gpus_per_worker 0 │
│ num_learner_workers 0 │
│ num_sgd_iter 30 │
│ num_workers 2 │
│ observation_filter NoFilter │
│ observation_fn │
│ observation_space │
│ offline_sampling False │
│ ope_split_batch_by_episode True │
│ output │
│ output_compress_columns [‘obs’, ‘new_obs’] │
│ output_max_file_size 67108864 │
│ placement_strategy PACK │
│ policies/default_policy …None, None, None) │
│ policies_to_train │
│ policy_map_cache -1 │
│ policy_map_capacity 100 │
│ policy_mapping_fn …t 0x7f400dbfb0d0> │
│ policy_states_are_swappable False │
│ postprocess_inputs False │
│ preprocessor_pref deepmind │
│ recreate_failed_workers False │
│ remote_env_batch_wait_ms 0 │
│ remote_worker_envs False │
│ render_env False │
│ replay_sequence_length │
│ restart_failed_sub_environments False │
│ rollout_fragment_length auto │
│ sample_async -1 │
│ sample_collector …leListCollector’> │
│ sampler_perf_stats_ema_coef │
│ seed │
│ sgd_minibatch_size 128 │
│ shuffle_buffer_size 0 │
│ shuffle_sequences True │
│ simple_optimizer -1 │
│ sync_filters_on_rollout_workers_timeout_s 60. │
│ synchronize_filters -1 │
│ tf_session_args/allow_soft_placement True │
│ tf_session_args/device_count/CPU 1 │
│ tf_session_args/gpu_options/allow_growth True │
│ tf_session_args/inter_op_parallelism_threads 2 │
│ tf_session_args/intra_op_parallelism_threads 2 │
│ tf_session_args/log_device_placement False │
│ torch_compile_learner False │
│ torch_compile_learner_dynamo_backend inductor │
│ torch_compile_learner_dynamo_mode │
│ torch_compile_learner_what_to_compile …ile.FORWARD_TRAIN │
│ torch_compile_worker False │
│ torch_compile_worker_dynamo_backend onnxrt │
│ torch_compile_worker_dynamo_mode │
│ train_batch_size 4000 │
│ update_worker_filter_stats True │
│ use_critic True │
│ use_gae True │
│ use_kl_loss True │
│ use_worker_filter_stats True │
│ validate_workers_after_construction True │
│ vf_clip_param 10. │
│ vf_loss_coeff 1. │
│ vf_share_layers -1 │
│ worker_cls -1 │
│ worker_health_probe_timeout_s 60 │
│ worker_restore_timeout_s 1800 │
╰───────────────────────────────────────────────────────────────────────────╯