Rllib is auto adjusting my action distribution

  • High: It blocks me to complete my task.
    Hi, I created my own distribution based on Normal distribution as follows. everything is fine but the problem is my sample returns for example 0.5 from distribution but I print 3 in my env!
    It seem it is adjusting based on my Box action space because when I change low or high on my observation space it changes like an intercept. my question is how can I stop that?
class TorchGaussian1(TorchDistributionWrapper):

    @override(ActionDistribution)
    def __init__(
        self,
        inputs: List[TensorType],
        model: TorchModelV2,
        *,
        action_space: Optional[gym.spaces.Space] = None
    ):
        super().__init__(inputs, model)
        mean = (self.inputs)
        print('mean',mean)
        log_std = 0.0006
        self.log_std = log_std
        self.dist = torch.distributions.normal.Normal(mean,log_std)
        # Remember to squeeze action samples in case action space is Box(shape)
        self.zero_action_dim = action_space and action_space.shape == ()

    @override(TorchDistributionWrapper)
    def sample(self) -> TensorType:
        sample = super().sample()
        if self.zero_action_dim:
            return torch.squeeze(sample, dim=-1)
        print('sample',sample)
        return sample

    @override(ActionDistribution)
    def deterministic_sample(self) -> TensorType:
        self.last_sample = self.dist.mean
        return self.last_sample

    @override(TorchDistributionWrapper)
    def logp(self, actions: TensorType) -> TensorType:
        return super().logp(actions).sum(-1)

    @override(TorchDistributionWrapper)
    def entropy(self) -> TensorType:
        return super().entropy().sum(-1)

    @override(TorchDistributionWrapper)
    def kl(self, other: ActionDistribution) -> TensorType:
        return super().kl(other).sum(-1)

    @staticmethod
    @override(ActionDistribution)
    def required_model_output_shape(
        action_space: gym.Space, model_config: ModelConfigDict
    ) -> Union[int, np.ndarray]:
        return np.prod(action_space.shape, dtype=np.int32) *1

guys, this distribution is somehow deterministic my logit shows 0.002 but my action is 1.5 and I don’t know why, it really blocks me to complete my task.
at the lowes possible value I get 1.5 and at the highest possible value of my logit ( 1 because I used a sigmoid) I get 3.
my action space is: spaces.Box(low = 0, high = 3, shape=(1,))
it seems at lowest possible value I get mean of my action space! but why?

it look like this:


its better from naĂŻve Normal because it is often clear how much std is acceptable for continuous actions from domain knowledge so it can save training resources, and its better from deterministic because it is often not logical to say find a exact float number for an action it is not realistic in many spaces. so deterministic + some noise is a good idea.
here is repro code:

what will happen if the sampled action does not meet low and high of the action space?
I need my action dist to work. I don’t know where to look up. dear @sven1977, I got stuck in my project and don’t know how can I solve this. as far as I know I did my job, please enlighten me what is happening?

that was simply because of 'normalize_actions' = True , it’s a default config and I didn’t pass it so I didn’t know maybe that’s the issue. anyway I wonder how it normalize ( and where) actions since it doesn’t know minimum and maximum of logits in the first place? if anyone can explain.

2 Likes