Solving Custom Gym Environment Termination Issue with tune.Tuner and Large Dataset

Hello. Please advise how to solve this problem.

I have a custom gym.Env with the following condition for its completion (df is a dataset with 100K lines):

         if self.current_step >= len(df):
             self.truncated = True
             self.terminated = True

When I run tune.Tuner with a similar condition, I get this output:

episode_len_mean: .nan
   episode_media: {}
   episode_reward_max: .nan
   episode_reward_mean: .nan
   episode_reward_min: .nan
   episodes_this_iter: 0
   episodes_total: 0

However, when I run tune.Tuner with this change in gym.Env:

         if self.current_step >= 2880:
             self.truncated = True
             self.terminated = True

My output looks like this:

   episode_len_mean: 2859.0
   episode_media: {}
   episode_reward_max: 241.68575602662057
   episode_reward_mean: 229.21604599702798
   episode_reward_min: 216.74633596743539
   episodes_this_iter: 1
   episodes_total: 2

Tell me what needs to be done so that under the first condition, including training? Since the dataset itself is large and I would like the model to be trained on the entire dataset.

By the way agent_timesteps_total start from 4000 and increase every iteration

Hi @overloader,

You don’t have a termination issue as far as I can see from your description. The dataset is being trained normally.

With such a large batch size and a large num_sgd_iter it takes a long time to complete an episode. In my opinion the detail value for this (32) is quite high and really allows down training. I usually at it to a value between 10-15. I do not recommend 1 but sometimes I use it at the beginning of training a new environment or configuration as a sanity check and to produce timing estimates.

There are three values that work together to determine how many iterations through the dataset there are during training. It is

num_sgd_iter * (train_batch_size // sgd_minibatch_size).

With the defaults that is 1024 sgd updates per training iteration. With your episode length there would be 25 training iterations before you completed 1 episode.

You could also consider increasing your training batch size, that value will be constrained by system and gpu memory.

1 Like

Hello, mannyv . Thank you for helping

When i change train_batch_size to 100000 (all dataset rows size) all work fine and calculate reward. But very slow.

At the same time, when I set train_batch_size less than, say, 50000, then I again have an .nan error, due to the measurement results.

Perhaps there are some other options besides changing train_batch_size to the size of the dataset in the gym, so that training would continue without .nan values in reward ?

I trying this:


        .training(
            lr=0.001,
            clip_param=0.2,
            train_batch_size = 50000,
            num_sgd_iter = 25,
            sgd_minibatch_size = 5000,
        )

but I also get .nan in reward :frowning:

Hi @overloader,

Perhaps we should clarify something. In this case, the NaN you are seeing probably does not mean that the rewards collected from the environment are NaN, in other cases it could.

What is happening here, is that the episode_reward value does not update until an episode returns done. Until it completes one full episode, the value shows as NaN. The actual rewards are probably not NaN it is just the summary reporting metric that is showing Nan because there is no data yet.