Hello Ray Team!
I’m trying to run the trained model in the “traditional method” environment. I’m using a custom version of the LunarLander environment, with multiple and continuous actions (spaces.Box) .
But action returned by compute_single_action function has only one dimension.
what is the better way to compute the action of an observation/state?
Thank you!!
trainer = agents.ppo.PPOTrainer(.....)
trainer.evaluate() # <== works fine after train!
...
env = MyLunarLander()
while True:
episode_reward = 0
done = False
obs = env.reset()
while not done:
action = trainer.compute_single_action(obs) # return only one integer.
obs, reward, done, info = env.step(action) # exception! IndexError: invalid index to scalar variable.
episode_reward += reward
env.render()
print("Total Reward:", episode_reward)