Hello Ray Team!
I’m trying to run the trained model in the “traditional method” environment. I’m using a custom version of the LunarLander environment, with multiple and continuous actions (spaces.Box) .
But action returned by compute_single_action function has only one dimension.
what is the better way to compute the action of an observation/state?
trainer = agents.ppo.PPOTrainer(.....) trainer.evaluate() # <== works fine after train! ... env = MyLunarLander() while True: episode_reward = 0 done = False obs = env.reset() while not done: action = trainer.compute_single_action(obs) # return only one integer. obs, reward, done, info = env.step(action) # exception! IndexError: invalid index to scalar variable. episode_reward += reward env.render() print("Total Reward:", episode_reward)