Rllib is auto adjusting my action distribution

hossein836 · May 21, 2022, 10:30am

High: It blocks me to complete my task.
Hi, I created my own distribution based on Normal distribution as follows. everything is fine but the problem is my sample returns for example 0.5 from distribution but I print 3 in my env!
It seem it is adjusting based on my Box action space because when I change low or high on my observation space it changes like an intercept. my question is how can I stop that?

class TorchGaussian1(TorchDistributionWrapper):

    @override(ActionDistribution)
    def __init__(
        self,
        inputs: List[TensorType],
        model: TorchModelV2,
        *,
        action_space: Optional[gym.spaces.Space] = None
    ):
        super().__init__(inputs, model)
        mean = (self.inputs)
        print('mean',mean)
        log_std = 0.0006
        self.log_std = log_std
        self.dist = torch.distributions.normal.Normal(mean,log_std)
        # Remember to squeeze action samples in case action space is Box(shape)
        self.zero_action_dim = action_space and action_space.shape == ()

    @override(TorchDistributionWrapper)
    def sample(self) -> TensorType:
        sample = super().sample()
        if self.zero_action_dim:
            return torch.squeeze(sample, dim=-1)
        print('sample',sample)
        return sample

    @override(ActionDistribution)
    def deterministic_sample(self) -> TensorType:
        self.last_sample = self.dist.mean
        return self.last_sample

    @override(TorchDistributionWrapper)
    def logp(self, actions: TensorType) -> TensorType:
        return super().logp(actions).sum(-1)

    @override(TorchDistributionWrapper)
    def entropy(self) -> TensorType:
        return super().entropy().sum(-1)

    @override(TorchDistributionWrapper)
    def kl(self, other: ActionDistribution) -> TensorType:
        return super().kl(other).sum(-1)

    @staticmethod
    @override(ActionDistribution)
    def required_model_output_shape(
        action_space: gym.Space, model_config: ModelConfigDict
    ) -> Union[int, np.ndarray]:
        return np.prod(action_space.shape, dtype=np.int32) *1

hossein836 · May 23, 2022, 9:55am

guys, this distribution is somehow deterministic my logit shows 0.002 but my action is 1.5 and I don’t know why, it really blocks me to complete my task.
at the lowes possible value I get 1.5 and at the highest possible value of my logit ( 1 because I used a sigmoid) I get 3.
my action space is: spaces.Box(low = 0, high = 3, shape=(1,))
it seems at lowest possible value I get mean of my action space! but why?

hossein836 · May 23, 2022, 10:42am

it look like this:

its better from naïve Normal because it is often clear how much std is acceptable for continuous actions from domain knowledge so it can save training resources, and its better from deterministic because it is often not logical to say find a exact float number for an action it is not realistic in many spaces. so deterministic + some noise is a good idea.
here is repro code:

hossein836 · May 24, 2022, 12:11pm

what will happen if the sampled action does not meet low and high of the action space?
I need my action dist to work. I don’t know where to look up. dear @sven1977, I got stuck in my project and don’t know how can I solve this. as far as I know I did my job, please enlighten me what is happening?

hossein836 · May 26, 2022, 7:43am

that was simply because of 'normalize_actions' = True , it’s a default config and I didn’t pass it so I didn’t know maybe that’s the issue. anyway I wonder how it normalize ( and where) actions since it doesn’t know minimum and maximum of logits in the first place? if anyone can explain.

Topic		Replies	Views
Help needed with a Custom Action Distribution (TorchDeterministic) RLlib	4	679	November 19, 2021
Continuous action space and custom model RLlib	4	1564	July 17, 2021
Where does ActionDistribution.sample() actually get called? RLlib	0	53	May 7, 2024
Observation dependent continuous action space ("Masking" continuous action space) RLlib	4	1106	February 9, 2022
How to choose the action dist for a custom model with a Tuple action space? RLlib	5	852	May 15, 2022

Rllib is auto adjusting my action distribution

Related topics