How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I have a custom environment which I’m training using the SAC config and tuner.fit(). My environment has a max_episode_steps = 200 variable. When steps reach this count, truncated is True. I assume this means that it counts it as an episode and reward metrics should be calclated. However, episode count in progress.csv is always 0 even though steps sampled and trained is increasing. All the reward metrics like env_runner/episode_reward_mean are all NaN as well. hist_stats is also empty.
Here is my step function:
def step(self, action):
self.steps += 1
print(self.steps)
action = 2**action
actions = {self.agentId: action}
if math.isnan(action):
print("====================================== action passed is nan =========================================")
print("STEPS: " + str(self.steps))
obs, rewards, dones, info_= self.runner.step(actions)
for key, value in obs.items():
obs[key] = np.asarray(value, dtype=np.float32)
print("observations: ",obs)
print("dones:", dones)
print("info:",info_)
print("rewards:", rewards)
if dones[self.agentId]:
self.runner.shutdown()
self.runner.cleanup()
if math.isnan(rewards[self.agentId]):
print("====================================== reward returned is nan =========================================")
reward = round(rewards[self.agentId],4)
print("REWARD: " + str(reward))
if any(np.isnan(np.asarray(obs[self.agentId], dtype=np.float32))):
print("====================================== obs returned is nan =========================================")
# completion = defaultdict(int)
obs = obs[self.agentId]
self.currentRecord = obs
self.obs.extend(obs)
obs = np.asarray(list(self.obs),dtype=np.float32)
if info_['simDone']:
dones[self.agentId] = True
if self.steps >= self.max_episode_steps:
truncated = True
self.runner.shutdown()
self.runner.cleanup()
else:
truncated = False
return obs, reward, dones[self.agentId], truncated, {} #reward, dones[self.agentId],truncated, {}
My environment could be reset but it takes a while when I set a condition within the environment to set done to True, it takes very long and not even a single training iteration completes even though the environment is resetting.
Here is my algorithm configuration:
config = (SACConfig()
.env_runners(num_env_runners=2, rollout_fragment_length=200)
.resources(num_gpus=1)
.environment("OmnetppEnv", env_config=env_config)
.evaluation(evaluation_config=evaluation_config,))
tuner = tune.Tuner(
"SAC",
run_config=air.RunConfig(stop={"timesteps_total": 10000},
name=f"SAC_1",
checkpoint_config=air.CheckpointConfig(checkpoint_frequency=100,
checkpoint_at_end=True
),
),
param_space=config
)
results = tuner.fit()
Things I’ve already tried:
- used batch_mode = completed_episodes
- set terminated to True after some steps. This makes it so that not even a single training iteration is completed and progress.csv is not even created. I think a worker restarts before it can reach terminated = True but still before it can terminate.
- used an eval_env_runner and used evaluation_duration_unit = ‘timesteps’. This kept making the worker crash.
config = (SACConfig()
debugging(seed=1)
.env_runners(num_env_runners=2, rollout_fragment_length=200) #, batch_mode="complete_episodes", horizon=400)
.resources(num_gpus=1)
.environment("OmnetppEnv", env_config=env_config)
.evaluation(evaluation_config=evaluation_config,
evaluation_num_env_runners=1,
evaluation_interval=1, # Evaluate every 5 training iterations
evaluation_duration=200,
evaluation_duration_unit="timesteps",)
evaluation_sample_timeout_s=None,)
evaluation_force_reset_envs_before_iteration=True)
))
- output print(episodes) in metrics.py, It is always 0 so I think the problem is with episodes not being counted but I’m not sure how to fix it.