RNN L2 weights regularization

Is there any plan for this feature?

You can get L2 loss with the custom loss API as follows.

    @override(ModelV2)
    def custom_loss(self, policy_loss,
                    loss_inputs):
        
        l2_lambda = 0.01

        l2_reg = torch.tensor(0.)
        for param in self.parameters():
            l2_reg += torch.norm(param)
        self.l2_loss = l2_lambda * l2_reg

        assert self.l2_loss.requires_grad, "l2 loss no gradient"
        
        custom_loss = self.l2_loss
        
        # depending on input add loss
        if self.hascustomloss: #in case you want to only regularize base on a config, ...
            if isinstance(policy_loss, list):
                return [single_loss+custom_loss for single_loss in policy_loss]
            else:
                return policy_loss+custom_loss

        return policy_loss

    def metrics(self):
        metrics = {
            "weight_loss": self.l2_loss.item(),
        }
       # you can print them to command line here. with Torch models its somehow not reportet to the logger
        print(metrics)
        return metrics

Thanks @Sertingolix . Interesting.
Could I combine this code with RecurrentNetwork in ray/recurrent_net.py at master Ā· ray-project/ray Ā· GitHub ?

Happy to help.

This works with all Parameters and sub-modules contained in your model. Therefore also with the RNN, at least Iā€™m not aware of anything that would break.

Thanks for your hints @Sertingolix
I have just tried to implement this in my repo at https://github.com/mg64ve/ray_custom_loss.
First of all I took custom_loss.py from ray repo and tested. It works.
Then I wrote my torch custom loss model in https://github.com/mg64ve/ray_custom_loss/blob/main/custom_loss_l2_model.py using your code.
Then I copied over custom_loss.py to custom_loss_l2.py and adjusted some things.
I am getting the following error:

Failure # 1 (occurred at 2021-06-24_07-34-25)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 586, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "/opt/conda/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 609, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/opt/conda/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/ray/worker.py", line 1456, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ray::PG.train_buffered() (pid=25896, ip=172.20.0.8)
  File "python/ray/_raylet.pyx", line 439, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 473, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 476, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
  File "/opt/conda/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 107, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/opt/conda/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 486, in __init__
    super().__init__(config, logger_creator)
  File "/opt/conda/lib/python3.8/site-packages/ray/tune/trainable.py", line 97, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/opt/conda/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 654, in setup
    self._init(self.config, self.env_creator)
  File "/opt/conda/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 134, in _init
    self.workers = self._make_workers(
  File "/opt/conda/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 725, in _make_workers
    return WorkerSet(
  File "/opt/conda/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 90, in __init__
    self._local_worker = self._make_worker(
  File "/opt/conda/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 321, in _make_worker
    worker = cls(
  File "/opt/conda/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 477, in __init__
    self._build_policy_map(policy_dict, policy_config)
  File "/opt/conda/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1108, in _build_policy_map
    policy_map[name] = cls(obs_space, act_space, merged_conf)
  File "/opt/conda/lib/python3.8/site-packages/ray/rllib/policy/tf_policy_template.py", line 214, in __init__
    DynamicTFPolicy.__init__(
  File "/opt/conda/lib/python3.8/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 167, in __init__
    self.model = ModelCatalog.get_model_v2(
  File "/opt/conda/lib/python3.8/site-packages/ray/rllib/models/catalog.py", line 427, in get_model_v2
    registered = set(instance.var_list)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'TorchCustomLossL2Model' object has no attribute 'var_list'

Any good hint you can give me?
Is this what you meant?

yeah, this is what i meant.

Running it in torch works for me. You would still have to set
self.hascustomloss=True in your init. otherwise as expected. Running without the framwork=torch flag causes your error.

python3 custom_loss_l2.py --framework=torch

You are right @Sertingolix
I have just changed and it works. The following is the code:

custom_loss_l2.py

After that I added LSTM settings and the following is the code:

custom_loss_l2_lstm.py

with the following custom loss model:

custom_loss_l2_model_lstm.py

Basically I have removed the part loading the json file with weights and I am now using PPO.
I am getting the following error:

RuntimeError: input.size(-1) must be equal to input_size. Expected 4, got 256

But I canā€™t understand what the error is.

So far I have not used the LSTM example. Does the lstm example work without the custom loss? The custom loss should not affect it, as we only take the norm of parameters which are independent of the size. I think there also is an lstm example in the example folder.

Thanks @Sertingolix .
I tooks the custom LSTM model from examples and added your code.
It works!
The following is the code:

rnn_model.py

Thanks a lot.

Great, you are welcome :+1:

One more question @Sertingolix . I am not sure if I need to open a different thread.
With ā€œuse_lstmā€ option I use ā€œlstm_use_prev_actionā€ and ā€œlstm_use_prev_rewardā€ to make the model learning from his trajectory.
In this case, I use rnn_model.py a custom RNN model, how can I make this model learning? Should I have to use Trajectory API?

Thatā€™s something I have to check myself.

To my understanding the lstm flags are only needed when you want to use a rllib lstm wrapper. I think the trajectory view API is the right way to go.

My plan for today is something similar with my own model. If I see an answer Iā€™ll post it.

any update @Sertingolix

I now opted for using an attention_net. The wrapper module would work if you set ā€œuse_attā€ (ā€œuse_lstmā€) in your config for the model. As I want to use my own custom model (I do not know what you want to change for the future, ā€¦), you could use the wrapper or create a model based on the wrapper. The attention network specifies the trajectory view for me.

To be honest I did not look at the base rnn model.

By setting the view requirements you should get always the last 50 obs concatenated.

self.view_requirements [SampleBatch.OBS] = ViewRequirement(shift="-50:0", space=self.obs_space)

Check the dimensions in the forward method!

The problem for me is that I want to user a MultiDiscrete action space.
I have tried to adjust TorchFrameStackingCartPoleModel but if num_frames=16 action_space=gym.spaces.MultiDiscrete([2,2]) the following definition:

    self.view_requirements["prev_n_actions"] = ViewRequirement(
        data_col="actions",
        shift="-{}:-1".format(self.num_frames),
        space=self.action_space)

returns in the forward step a torch of dimensions torch.Size([32, 16, 2]) and I canā€™t understand why. After using torch_one_hot the dimensions are torch.Size([32, 2, 4]) which still I canā€™t understand since if you have can choose actions among 2 values you should be able to onehot into a dimension 2.
Any good hint for this?

Hi @mg64ve,

Do you have example code you could share? There is not currently enough information to see exactly what is happening with your model.

Yes @mannyv . I have an example in the following small repository:

StatelessCartPoleMD

Please have a look to the forward method in trajectory_view_utilizing_models.py

Hi there, I checked your code.
Everything works as expected

The action you get from the view_requirement is the sampled action (thats also why you use onehot in your model) . Adding a time dimension results in [BATCH,TIME,ACTION] Tensor.

In your case the rllib method does not work correctly because of the additional time dimension. Maybe also because you mix different kind of spaces. Model Action space is a box space of logits,ā€¦ I did not check how rllibs onehot function works.

using the torch one_hot function works correctly

import torch.nn.functional as F
actions = F.one_hot(input_dict["prev_n_actions"],2)

Thanks @Sertingolix but the follwoing is the Ray code:

def one_hot(x, space):
    if isinstance(space, Discrete):
        return nn.functional.one_hot(x.long(), space.n)
    elif isinstance(space, MultiDiscrete):
        return torch.cat(
            [
                nn.functional.one_hot(x[:, i].long(), n)
                for i, n in enumerate(space.nvec)
            ],
            dim=-1)
    else:
        raise ValueError("Unsupported space for `one_hot`: {}".format(space))

So MultiDiscrete should already included.
When you write:

actions = F.one_hot(input_dict["prev_n_actions"],2)

ā€˜2ā€™ stands for? Because action_space is MultiDiscrete([2,2]) so I think Ray code would hot encode it in 4 columns.
What do you think?

Thatā€™s why the time dimension is fucked up. you would need
nn.functional.one_hot(x[:,:, i].long(), n)

The two is from num_classes=2. In theory you could also leave out the onehot step as the information is the same (sampled action, read below). It might still make a difference on how easy your model can learn it.

If you define your Multidiscrete space as input, yes rllib would map it to [B,4]. What I saw from other examples and the dimension seem to confirm it, is that in the view you always get the sampled action which in this case is [B,2] for one timestep.