How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I’m training an agent on an external simulation engine for which I’ve made a custom environment. This environment takes 2 parameters -
- number of steps in an episode and
- amount of time each step takes.
I need to train the agent on multiple values for each parameter.
Up to now, I’ve been saving a checkpoint at the end of a run, build a new PPOConfig, load the checkpoint, then call train() all in a nested loop.
Conceptually, something like this:
train_config = {}
steps = [800, 600, 400]
times = [5, 3, 1]
checkpoint = None
for step in steps:
for t in times:
train_config['steps'] = step
train_config['time']=t
pulse = PPOConfig().environment(env=Pulse, env_config=train_config).build()
if checkpoint is not None:
pulse.restore(checkpoint)
while True:
result = pulse.train()
if result['episode_reward_max'] > steps * 0.095:
break
checkpoint = pulse.save_checkpoint()
type or paste code here
That seems to work OK, I think, but I can’t help thinking that I’m doing too much manually. There has to be a better way.
To that end, I read on the forum here that running through Tune is the “right” way, so I changed my code to this:
train_config = {}
steps = [800, 600, 400]
times = [5, 3, 1]
checkpoint = None
for step in steps:
for t in times:
train_config['steps'] = step
train_config['time']=t
pulse = PPOConfig().environment(env=Pulse, env_config=train_config)
analysis =tune.run(
"PPO",
name= "new_api_loop",
config = pulse,
restore=checkpoint,
stop = {"env_runners/episode_return_max" : step * 0.095},
checkpoint_at_end=True,
)
checkpoint = analysis.get_last_checkpoint().path
So, which of those 2 methods is better and, more importantly, is there some other way I should be training? I would really love to get rid of that nested loop but I can’t figure out how to initialize the environment with differing parameters.
Thanks