Compute actions Programmatically

Hello Ray Team!

I’m trying to run the trained model in the “traditional method” environment. I’m using a custom version of the LunarLander environment, with multiple and continuous actions (spaces.Box) .

But action returned by compute_single_action function has only one dimension.

what is the better way to compute the action of an observation/state?

Thank you!!

trainer = agents.ppo.PPOTrainer(.....)

trainer.evaluate() # <== works fine after train!

...

env = MyLunarLander()

while True:
    episode_reward = 0
    done = False
    obs = env.reset()
    while not done:
        
        action = trainer.compute_single_action(obs) # return only one integer. 
        obs, reward, done, info = env.step(action)  # exception! IndexError: invalid index to scalar variable.
        episode_reward += reward
        env.render()

    print("Total Reward:", episode_reward)

Hi @fdmartins ,

and welcome to the discussion board. The error you are encountering cannot be reproduced because some code is missing. Especially, there is no way to understand what your MyLunarLander environment is doing internally. You need to provide more code, if you want to have answers that help you at your problem.