Compute actions Programmatically

fdmartins · February 2, 2022, 8:18pm

Hello Ray Team!

I’m trying to run the trained model in the “traditional method” environment. I’m using a custom version of the LunarLander environment, with multiple and continuous actions (spaces.Box) .

But action returned by compute_single_action function has only one dimension.

what is the better way to compute the action of an observation/state?

Thank you!!

trainer = agents.ppo.PPOTrainer(.....)

trainer.evaluate() # <== works fine after train!

...

env = MyLunarLander()

while True:
    episode_reward = 0
    done = False
    obs = env.reset()
    while not done:
        
        action = trainer.compute_single_action(obs) # return only one integer. 
        obs, reward, done, info = env.step(action)  # exception! IndexError: invalid index to scalar variable.
        episode_reward += reward
        env.render()

    print("Total Reward:", episode_reward)

Lars_Simon_Zehnder · February 5, 2022, 6:40pm

Hi @fdmartins ,

and welcome to the discussion board. The error you are encountering cannot be reproduced because some code is missing. Especially, there is no way to understand what your MyLunarLander environment is doing internally. You need to provide more code, if you want to have answers that help you at your problem.

Topic		Replies	Views
How to compute actions with RLlib and Tune after training RLlib	6	520	July 15, 2025
Inconsistent actions from Algorithm.compute_single_action RLlib	3	420	June 14, 2023
Compute_actions() for VectorEnv RLlib	1	245	September 4, 2022
Eval agent computes action outside of Environment Bounds RLlib	1	507	January 12, 2022
Compute Action with LSTM RLlib	4	908	May 16, 2024

Compute actions Programmatically

Related topics