PPO only run several steps in one episode

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity
  • Low: It annoys or frustrates me for a moment.
  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.
  • High: It blocks me to complete my task.
    High

Hi all,

I am training a PPO using my custom environment. The code can run, but the process stops in 3 or 4 steps in one episode. (there are time steps in my environment). I would appreciate it a lot if someone could help me with the code. Following is my config code.

def train_ppo():
    config = (
        PPOConfig()
        .training(
            train_batch_size_per_learner=64,
            mini_batch_size_per_learner=64,
            lambda_=0.95,
            kl_coeff=0.5,
            clip_param=0.1,
            vf_clip_param=10.0,
            entropy_coeff=0.01,
            num_sgd_iter=10,
            lr=0.00015,
            grad_clip=100.0,
            grad_clip_by="global_norm",
        )
        .environment("my_env", env_config=env_config)
        .rollouts(
            num_rollout_workers=1,
            rollout_fragment_length=50,
        )
        .rl_module(
            model_config_dict={
                "fcnet_hiddens": [512, 512],
                "fcnet_activation": "tanh"
            }
        )
        .resources(
            num_gpus=0,
        )
        .callbacks(CustomCallbacks)
        .debugging(log_level="DEBUG")
    )
    algo = config.build()
    result = algo.train()

train_ppo()

Another problem is I don’t know how to output the reward I got. There is no reward in result. And I can confirm that I have return reward in my step function in the environment.

Thank you!!

BR

Hi @ZiyanLydia,

Your issue is this line here:

result = algo.train()

That will do one iteration of rollouts followed by optimization.

To do more than one iteration you either need to put it in a loop or more prefered is to specify a stopping condition and use the tune api.