Hi there, I’m using a custom environment with a tuple (gym space) action space.
TL;DR - I’m having trouble about how should I construct the output of the model from the forward function.
Thank you! The first example helps me, but I specifically have trouble with a custom tf model, I don’t know how to define the type and shape of the forward pass.
I guess I will debug a simple case of non-costum model to understand it, but if someone has a reference that would be a great help
Hey @Ofir_Abu , your Tuple space results in a MultiActionDistribution to be chosen by RLlib as the model’s output (the model parameterizes this distribution type and outputs an according number of nodes). The output values of the model are then split inside this distribution, according to the individual sub-spaces (2x DiscreteWDtype) and then actions will be sampled from these two spaces individually using the logits produced by your model.
You can debug into your forward pass by setting a breakpoint in e.g. rllib/models/torch/torch_action_dist::TorchMultiActionDistribution::sample() (of the respective tf version) AND setting local_mode=True in your call to ray.init().