Compute_single_action with explore=false returns the same result

evo11x · June 8, 2024, 11:17am

Is it normal with APPO/attention net for compute_single_action to return the same result with explore=False and different observations?

I also have tried to use a trained model with a single iteration to get more random results, but consecutive calls to compute_single_action with different observations returns the same result

action, state_out, _ = self.trainer.compute_single_action(obs, state=self.state_list, explore=False)                  
self.state_list = [np.concatenate((self.state_list[i], [state_out[i]]))[1:] for i in range(self.transformer_length)]

without explore=False it returns different actions, but I think it just randomizes the actions instead of performing what it has learned

This is my observation:

self.observation_space = gym.spaces.Dict({            
            "data": gym.spaces.Box(low=-8.0, high=8.0, shape=(self.data_size,), dtype=np.float32),            
            'h1': gym.spaces.Box(low=-2.1, high=2.1, shape=(15,), dtype=np.float32),
            'h2': gym.spaces.Box(low=-1.1, high=1.1, shape=(10,), dtype=np.float32),
        })

And this is the action

self.action_space = gym.spaces.Box(
            low=0.0, high=1.0, shape=(2*3,), dtype=np.float32)

Ray v2.20
Python 3.10.10
Windows 11

grizzlybearg · July 15, 2024, 3:53pm

I would also like to get more info about this behavior. When serving an LSTM, deactivating the exploration behavior leads to the same actions despite different observations. My assumption was that the model is trying to sort of replay the trained policy step for step, but I don’t think that’s the case.

evo11x · August 20, 2024, 10:38pm

Did you find a solution? Have you tried on linux?

Topic		Replies	Views
Inconsistent actions from Algorithm.compute_single_action RLlib	3	407	June 14, 2023
Compute_single_action(obs, state) of policy and algo: different performance Checkpointing, Restoring	1	704	April 13, 2023
LSTM with trainer.compute_single_action broken again RLlib	12	1044	May 17, 2022
Policy.compute_single_action() wrong outputs RLlib	0	220	October 30, 2023
[rllib] Problem running compute_single_action from PPO restored checkpoint Checkpointing, Restoring	1	351	December 13, 2023

Compute_single_action with explore=false returns the same result

Related topics