Function Parameters in Customizing Models in Ray RLLib

Yiming_Dong · November 5, 2023, 12:40pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I started using Ray RLLib (version 2.7.1) to solve a RL problem with customized environments and agents. While the envs are properly set, when I started to customize models, I am quite confused and can not proceed on.

1. What does the parameters mean in each of the methods?
As is in the official docs, one may customize the PyTorch model like this:

class MyModel(TorchModelV2, nn.Module):
    def __init__(self, obs_space, action_space, num_outputs, model_config, name):
        TorchModelV2.__init__(self, obs_space, action_space,
                              num_outputs, model_config, name)
        nn.Module.__init__(self)

        # __init__ function logic

    def forward(self, input_dict, state, seq_lens):
        # forward function logic

    def value_function(self):
         # value function logic

But the official docs do not tell what does it mean for each of the parameters. Where can I get the detailed explanations?

2. How does the _disable_preprocessor_api act?
What confused me most is that the obs_space parameter in the __init__ method comes differently from what I have defined. Also, the input_dict parameter in forward method contains the observation input_dict['obs'] which has the data structure unclear to me. What’s more, there seems to be a discrepancy between the default behaviours described in the official docs and the behaviours appear in my case.
Let’s be specific. My observation space is:

        operation_space = Dict({
            'a': Discrete(13),
            'b': Tuple([
                Dict({
                    'b1': Discrete(3),
                    'b2': Box(low=-10, high=10, shape=(1,), dtype=np.float64),
                    'b3': Discrete(7)
                }) for _ in range(2)
            ]),
            'c': Discrete(7)
        })
        self.observation_space = Tuple([operation_space]*10)

I am writing my config as this:

config = (
    get_trainable_cls('PPO')
    .get_default_config()
    .rl_module(_enable_rl_module_api=False)
    .training(
        model={
            "_disable_preprocessor_api": True,
            "custom_model": "my_model",
            # "custom_model_config": {
            #     "input_files": args.input_files,
            # },
        },
        _enable_learner_api=False
    )
    .environment(RLSearchEnv, env_config=RLSearchEnv_config)
    .framework("torch")
    .rollouts(num_rollout_workers=1)
    .resources(num_gpus=2)
    .experimental(_disable_preprocessor_api=True)
)

There are two _disable_preprocessor_api’s in the config, although they are explained same in the official docs, it shows up different behaviours when I set them with different values.
Case 1. model={"_disable_preprocessor_api": False, ...} with .experimental(_disable_preprocessor_api=False), which is the default behaviour of RLLib and the default preprocessors are applied.
The obs_space is Box(-1.0, 1.0, (420,), float32), which is the one-hot encoded and flattened version of my original definition. I’ve checked the size match.
The input_dict['obs'] preserves the original nested structure of my observation space (i.e. it is a 10-length list), but in each Discrete subspace, it is now the one-hot encoded torch.tensor with additional batch dimension:

>>> input_dict['obs'][0]['a'].shape 
torch.Size([32, 13])

Case 2. model={"_disable_preprocessor_api": True, ...} with .experimental(_disable_preprocessor_api=False)
The obs_space is Box(-1.0, 1.0, (420,), float32), same as case 1.
The input_dict['obs'] is now a torch.tensor with shape [32, 420]. I guess that it flattens the observation and prepends a batch dimension.

Case 3. model={"_disable_preprocessor_api": False, ...} with .experimental(_disable_preprocessor_api=True)
The obs_space now preserves the original nested structure, it is the same as the self.observation_space.
The input_dict['obs'] preserves the original nested structure of my observation space, but different from the case 1, it does not one-hot encode the discrete space, but still add a batch dimension:

>>> input_dict['obs'][0]['a'].shape 
torch.Size([32])

Case 4. model={"_disable_preprocessor_api": True, ...} with .experimental(_disable_preprocessor_api=True)
The obs_space is the same as the self.observation_space.
The input_dict['obs'] is same as the case 3.

They are too complicated to be understood, and I’m unable to continue programming if I do not figure out what does they exactly mean. It would be greatly appreciated how to use this parameters. Thanks a lot in advance.

Topic		Replies	Views
Issue with Custom PyTorch Model in Ray RLlib RLlib	0	286	November 3, 2023
[rllib] Dict Action Space and Custom Model RLlib	5	2360	March 30, 2021
Why is my `rllib.models.torch.torch_modelv2.TorchModelV2` receiving a Tensor of shape ( 32, <observation size> )? Configure Algorithm, Training, Evaluation, Scaling	1	685	November 15, 2022
Implementing a custom RNN using the TorchModelV2 RLlib	1	588	December 16, 2022
Input_dict's data structure RLlib	2	107	November 25, 2024

Function Parameters in Customizing Models in Ray RLLib

Related topics