Model.training never True

smorad · August 10, 2021, 10:49am

Previously, TorchModelV2.training would be set to False when sampling actions for rollouts and True when training the model. With the latest nightly wheel on 10 Aug, this is no longer the case. TorchModelV2.training is always False.

I believe this is a big issue, as torch.nn.module.training affects dropout, batch norm, etc. So these modules will be silently disabled during train time.

This has been verified with a custom model training on cartpole:

def forward(input_dict, state, seq_lens):
  if self.training:
    raise Exception('Training')
  ...

But the model keeps training and never crashes.

michaelzhiluo · August 12, 2021, 8:42am

Definitely a bug! Can you post this as a Git issue and tag @sven1977 and @michaelzhiluo?

michaelzhiluo · August 12, 2021, 8:46am

I think the training parameter is set to True in this line of code: ray/torch_policy.py at 3e010c5760c99be5a9940001f33db087c52eb8e7 · ray-project/ray · GitHub

Lmk otherwise

smorad · August 12, 2021, 9:03am

Installed Aug 12 nightly to verify issue, and it seems to have already been fixed. Fix seems to be in [RLlib] Issue 17653: Torch multi-GPU (>1) broken for LSTMs. (#17657) · ray-project/ray@811d71b · GitHub. My test script for posterity:

import gym
import ray
import os
import torch
from ray.rllib.agents.ppo import PPOTrainer
from ray.rllib.models.torch.misc import SlimFC
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2

class Env:
    def __init__(self, cfg):
        self.observation_space = gym.spaces.Discrete(1)
        self.action_space = gym.spaces.Discrete(1)

    def step(self, action):
        return 0, 0, False, {}

    def reset(self):
        return 0

class Model(TorchModelV2, torch.nn.Module):
    def __init__(
        self,
        obs_space,
        action_space,
        num_outputs,
        model_config,
        name,
        **custom_model_kwargs,
    ):
        TorchModelV2.__init__(
            self, obs_space, action_space, num_outputs, model_config, name
        )
        torch.nn.Module.__init__(self)
        self.num_outputs = num_outputs
        self.obs_dim = gym.spaces.utils.flatdim(obs_space)
        self.act_space = action_space
        self.act_dim = gym.spaces.utils.flatdim(action_space)

        self.logit_branch = SlimFC(
            in_size=1,
            out_size=self.num_outputs,
            activation_fn=None,
        )
        self.value_branch = SlimFC(
            in_size=1,
            out_size=1,
            activation_fn=None,
        )

    def forward(
        self,
        input_dict,
        state,
        seq_lens,
        ):
        if self.training:
            raise Exception("Shit's wack, yo")

        logits = input_dict["obs_flat"].reshape(-1,1)
        self.values = input_dict["obs_flat"].reshape(-1)
        return logits, []

    def value_function(self):
        assert self.values is not None, "must call forward() first"
        return self.values

cfg = {
        "env_config": {},
        "framework": "torch",
        "num_gpus": 1,
        "env": Env,
        "model": {
            "custom_model": Model,
        },
}
ray.init()
analysis = ray.tune.run(
    PPOTrainer,
    config=cfg,
)

sven1977 · August 12, 2021, 1:24pm

Thanks for raising this @smorad !
The fix for this has been merged here:

github.com/ray-project/ray

[RLlib] Issue 17653: Torch multi-GPU (>1) broken for LSTMs.

ray-project:master ← sven1977:issue_17653_torch_2_gpus_fails_with_lstm

opened 01:52AM - 07 Aug 21 UTC

sven1977

+7 -4

See this issue here: https://github.com/ray-project/ray/issues/17653 Iss…ue #17653 ## Why are these changes needed? Closes #17653 ## Related issue number ## Checks - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

Topic		Replies	Views
Is a TorchPolicy or the DL model in this policy automatically set to evaluation mode (model.eval()) during the default evaluation phase? RLlib	2	433	June 6, 2021
RNN L2 weights regularization RLlib	41	2070	July 5, 2021
Subclasses of TorchModelV2 must also inherit from nn.Module, e.g., MyModel(TorchModelV2, nn.Module) RLlib	1	286	April 15, 2022
State shapes incorrect using custom model (TorchModelV2) (PPO) RLlib	2	435	July 15, 2021
After updating from Ray 1.0.1 to 1.2, custom model stops working RLlib	2	1591	March 8, 2022

Model.training never True

Related topics