Intended workflow with for custom models using num_outputs

What’s the correct way to handle num_outputs in a custom model?

  1. it seems that the action space should be used to get the correct output size, so num_outputs seems redundant in that case.
  2. It’s not passed in to the model initializer in some cases (and the last layer is just the flattened features in the examples), so I’m wondering what’s the logic behind that and in what scenarios the if num_outputs check is needed.
  3. Wrappers change num_outputs in a somewhat confusing way, but I assume the num_outputs of the wrapper should be separate from the wrapped class’ one, e.g. from LSTMWrapper:
        # Add prev-action/reward nodes to input to LSTM.
        if self.use_prev_action:
            self.num_outputs += self.action_dim
        if self.use_prev_reward:
            self.num_outputs += 1

        self.lstm = nn.LSTM(
            self.num_outputs, self.cell_size, batch_first=not self.time_major)

        self.num_outputs = num_outputs

Hi!

  1. Usually we use the outputs of a model to parameterize a probability distribution of actions, with one neuron representing the mean and another representing the the variance or standard deviation of one gaussian distribution per action. So the connection between action_space and num_outputs is clear. But you can wrap models and depending on how you do so, it makes sense to me that they are both passed to many models. The LSTMWrapper that you talk about under (3) looks like good example for that to me.
  2. As you said under (1), in many cases the information is redundant (but in others its not?).
  3. The LSTMWrapper uses num_outputs in a way that is a little confusing:

.

# At this point, self.num_outputs is the number of nodes coming
# from the wrapped (underlying) model. In other words, self.num_outputs
# is the input size for the LSTM layer

First, num_outputs is used by the wrapped model and holds the number of its outputs.

# Add prev-action/reward nodes to input to LSTM.
if self.use_prev_action:
    self.num_outputs += self.action_dim
if self.use_prev_reward:
    self.num_outputs += 1

Then, we use num_outputs to add up how many input neurons the LSTM layer should have. This is where it gets a little confusing, since at times the outputs of this model never become what num_outputs is.

# Set self.num_outputs to the number of output nodes desired by the
# caller of this constructor.
self.num_outputs = num_outputs

In the end num_outputs becomes what we expect it to be.

So in the simplest case, the underlying model would want to know only one, action or observations space, because one conditions the other. Some models do not care about the action_space and simply provide num_outputs output neurons because they want RLlib to attach the last layer, depending on the action sampling. But the LSTMWrapper needs to know the action_space and does not care about num_outputs. So in order to support both, you can pass both to the model API.

This is how I understand the code. Does it make sense to you?

2 Likes

Thanks @arturn for the explanation.


Usually we use the outputs of a model to parameterize a probability distribution of actions, with one neuron representing the mean and another representing the the variance or standard deviation of one gaussian distribution per action

I think this greatly depends on the action distribution, e.g. see here, the one that you mentioned is the # Box space -> DiagGaussian OR Deterministic. case if I’m not mistaken.


But the LSTMWrapper needs to know the action_space and does not care about num_outputs.

I would assume that because of self.num_outputs += self.action_dim it does need both.


Digging a bit more into this, I think the main difference can be seen here, where the policy classes are being instantiated.

What I find weird is that if the model is customized then it doesn’t pass the num_outputs to the model’s constructor:

                self.model = make_model(self, obs_space, action_space, config)
                dist_class, _ = ModelCatalog.get_action_dist(
                    action_space, self.config["model"], framework=framework)

I would expect (similarly to the custom dist cases) something like this:

                dist_class, num_outputs = ModelCatalog.get_action_dist(
                    action_space, self.config["model"], framework=framework)
                self.model = make_model(self, obs_space, action_space, num_outputs=num_outputs, config)

This seems to imply that the num_outputs needs to be handled explicitly inside a custom policy’s constructor.

Maybe @sven1977 can shed some light on if this was intended and why? Or I’m just missing something obvious.

I think this greatly depends on the action distribution, e.g. see here, the one that you mentioned is the # Box space -> DiagGaussian OR Deterministic. case if I’m not mistaken.

Yes, I was indeed referring to the DiagGaussian, which I think is chosen by RLLib by default. Of course there are others!

I would assume that because of self.num_outputs += self.action_dim it does need both.

Your are right, it does require both but not until further downwards. self.num_outputs depends on the wrapped model in the line that you reference. Further down, the num_outptus that is passed to the LSTMWrapper’s _init_ is actually used. But since it requires both, num_outputs and action_space, this still answers (1).

I am not that certain regarding your last point.

Hi @vakker00,

I think two conceptually different things are being lumped together. There are two types of customization. The most common type of customization is defining a new neural network architecture to use as your model. In this case the architecture is customized but the way rllib builds that model is not.

The second type of customization is how rllib builds the model. The make_model kwarg is a callable that allows you to customize that process. A quick look at this example will show you how that is used. If you look inside they are actually using the standard way to build two different networks and do in fact use num_outputs there.

num_outputs is a variable that makes it easy to wrap one model in another. So in the case of an LSTMWrapper, self.num_outputs actually means the number of outputs coming out of the layer that is being automatically wrapped. It we have a FC network that is being wrapped then its final layer is not the ultimate output layer it is an input layer to the LSTM and the LSTMwrapper with have the final output layer. In that case self.num_outputs is expressing the number of units in the last layer before the lstm and the variable passed in as num_outputs will be the number of values returned by a call to the LSTMwrappers forward method. So member variable is the input size and function argument is the output size.

The reason that the action_distribution does not take a num_outputs argument is because the action_distributions used in rllib are entirely defined by the shapes of the input and the action space when they are constructed.

@mannyv thanks for the clarification, it makes sense.

The reason that the action_distribution does not take a num_outputs argument is because the action_distributions used in rllib are entirely defined by the shapes of the input and the action space when they are constructed.

The only thing I would add to this, is that it’s not clear why here the action_distributions’ calculated num_outputs is not used in make_model, not the other way around. I would make sense if it was the the same way how it’s implemented a couple of lines below (here).