How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
Hi!
I noticed a strange behavior with rllib evaluate
in command line: depending on the environment, the number of evaluation episodes requested in the command line never get reached during the evaluation.
For example, running rllib evaluate <chkpt_path> --run PPO --env CartPole-v0 --episodes 100
will only run about 50-55 evaluation episodes (no matter the number of episodes requested in the command line as long as it is >55). I have the same problem with other environments/algorithms (but each time a different “max” number of episodes)
Here is a reproduction script:
import ray
from ray import tune
import gym
import os
ray.tune.run(
'PPO',
stop={
'training_iteration': 50,
},
config={'env':'CartPole-v0'},
checkpoint_freq=25,
checkpoint_score_attr='episode_reward_mean',
checkpoint_at_end=True,
local_dir='checkpoint',
)
os.chdir('checkpoint/PPO')
folders=os.listdir('.')
result=[]
for filename in folders:
if os.path.isdir(os.path.join(os.path.abspath('.'), filename)):
result.append(filename)
chkpt_dir = max(result, key=os.path.getmtime)
os.system('rllib evaluate '+chkpt_dir+'/checkpoint_000050/checkpoint-50 --run PPO --env CartPole-v0 --episodes 200')