Is there any plan for this feature?
You can get L2 loss with the custom loss API as follows.
@override(ModelV2)
def custom_loss(self, policy_loss,
loss_inputs):
l2_lambda = 0.01
l2_reg = torch.tensor(0.)
for param in self.parameters():
l2_reg += torch.norm(param)
self.l2_loss = l2_lambda * l2_reg
assert self.l2_loss.requires_grad, "l2 loss no gradient"
custom_loss = self.l2_loss
# depending on input add loss
if self.hascustomloss: #in case you want to only regularize base on a config, ...
if isinstance(policy_loss, list):
return [single_loss+custom_loss for single_loss in policy_loss]
else:
return policy_loss+custom_loss
return policy_loss
def metrics(self):
metrics = {
"weight_loss": self.l2_loss.item(),
}
# you can print them to command line here. with Torch models its somehow not reportet to the logger
print(metrics)
return metrics
Thanks @Sertingolix . Interesting.
Could I combine this code with RecurrentNetwork in ray/recurrent_net.py at master Ā· ray-project/ray Ā· GitHub ?
Happy to help.
This works with all Parameters and sub-modules contained in your model. Therefore also with the RNN, at least Iām not aware of anything that would break.
Thanks for your hints @Sertingolix
I have just tried to implement this in my repo at https://github.com/mg64ve/ray_custom_loss.
First of all I took custom_loss.py from ray repo and tested. It works.
Then I wrote my torch custom loss model in https://github.com/mg64ve/ray_custom_loss/blob/main/custom_loss_l2_model.py using your code.
Then I copied over custom_loss.py to custom_loss_l2.py and adjusted some things.
I am getting the following error:
Failure # 1 (occurred at 2021-06-24_07-34-25)
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 586, in _process_trial
results = self.trial_executor.fetch_result(trial)
File "/opt/conda/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 609, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/opt/conda/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
return func(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/ray/worker.py", line 1456, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ray::PG.train_buffered() (pid=25896, ip=172.20.0.8)
File "python/ray/_raylet.pyx", line 439, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 473, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 476, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
File "/opt/conda/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 107, in __init__
Trainer.__init__(self, config, env, logger_creator)
File "/opt/conda/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 486, in __init__
super().__init__(config, logger_creator)
File "/opt/conda/lib/python3.8/site-packages/ray/tune/trainable.py", line 97, in __init__
self.setup(copy.deepcopy(self.config))
File "/opt/conda/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 654, in setup
self._init(self.config, self.env_creator)
File "/opt/conda/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 134, in _init
self.workers = self._make_workers(
File "/opt/conda/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 725, in _make_workers
return WorkerSet(
File "/opt/conda/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 90, in __init__
self._local_worker = self._make_worker(
File "/opt/conda/lib/python3.8/site-packages/ray/rllib/evaluation/worker_set.py", line 321, in _make_worker
worker = cls(
File "/opt/conda/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 477, in __init__
self._build_policy_map(policy_dict, policy_config)
File "/opt/conda/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1108, in _build_policy_map
policy_map[name] = cls(obs_space, act_space, merged_conf)
File "/opt/conda/lib/python3.8/site-packages/ray/rllib/policy/tf_policy_template.py", line 214, in __init__
DynamicTFPolicy.__init__(
File "/opt/conda/lib/python3.8/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 167, in __init__
self.model = ModelCatalog.get_model_v2(
File "/opt/conda/lib/python3.8/site-packages/ray/rllib/models/catalog.py", line 427, in get_model_v2
registered = set(instance.var_list)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'TorchCustomLossL2Model' object has no attribute 'var_list'
Any good hint you can give me?
Is this what you meant?
yeah, this is what i meant.
Running it in torch works for me. You would still have to set
self.hascustomloss=True
in your init. otherwise as expected. Running without the framwork=torch flag causes your error.
python3 custom_loss_l2.py --framework=torch
You are right @Sertingolix
I have just changed and it works. The following is the code:
After that I added LSTM settings and the following is the code:
with the following custom loss model:
Basically I have removed the part loading the json file with weights and I am now using PPO.
I am getting the following error:
RuntimeError: input.size(-1) must be equal to input_size. Expected 4, got 256
But I canāt understand what the error is.
So far I have not used the LSTM example. Does the lstm example work without the custom loss? The custom loss should not affect it, as we only take the norm of parameters which are independent of the size. I think there also is an lstm example in the example folder.
Thanks @Sertingolix .
I tooks the custom LSTM model from examples and added your code.
It works!
The following is the code:
Thanks a lot.
Great, you are welcome
One more question @Sertingolix . I am not sure if I need to open a different thread.
With āuse_lstmā option I use ālstm_use_prev_actionā and ālstm_use_prev_rewardā to make the model learning from his trajectory.
In this case, I use rnn_model.py a custom RNN model, how can I make this model learning? Should I have to use Trajectory API?
Thatās something I have to check myself.
To my understanding the lstm flags are only needed when you want to use a rllib lstm wrapper. I think the trajectory view API is the right way to go.
My plan for today is something similar with my own model. If I see an answer Iāll post it.
any update @Sertingolix
I now opted for using an attention_net. The wrapper module would work if you set āuse_attā (āuse_lstmā) in your config for the model. As I want to use my own custom model (I do not know what you want to change for the future, ā¦), you could use the wrapper or create a model based on the wrapper. The attention network specifies the trajectory view for me.
To be honest I did not look at the base rnn model.
By setting the view requirements you should get always the last 50 obs concatenated.
self.view_requirements [SampleBatch.OBS] = ViewRequirement(shift="-50:0", space=self.obs_space)
Check the dimensions in the forward method!
The problem for me is that I want to user a MultiDiscrete action space.
I have tried to adjust TorchFrameStackingCartPoleModel but if num_frames=16 action_space=gym.spaces.MultiDiscrete([2,2]) the following definition:
self.view_requirements["prev_n_actions"] = ViewRequirement(
data_col="actions",
shift="-{}:-1".format(self.num_frames),
space=self.action_space)
returns in the forward step a torch of dimensions torch.Size([32, 16, 2]) and I canāt understand why. After using torch_one_hot the dimensions are torch.Size([32, 2, 4]) which still I canāt understand since if you have can choose actions among 2 values you should be able to onehot into a dimension 2.
Any good hint for this?
Hi @mg64ve,
Do you have example code you could share? There is not currently enough information to see exactly what is happening with your model.
Yes @mannyv . I have an example in the following small repository:
Please have a look to the forward method in trajectory_view_utilizing_models.py
Hi there, I checked your code.
Everything works as expected
The action you get from the view_requirement is the sampled action (thats also why you use onehot in your model) . Adding a time dimension results in [BATCH,TIME,ACTION] Tensor.
In your case the rllib method does not work correctly because of the additional time dimension. Maybe also because you mix different kind of spaces. Model Action space is a box space of logits,ā¦ I did not check how rllibs onehot function works.
using the torch one_hot function works correctly
import torch.nn.functional as F
actions = F.one_hot(input_dict["prev_n_actions"],2)
Thanks @Sertingolix but the follwoing is the Ray code:
def one_hot(x, space):
if isinstance(space, Discrete):
return nn.functional.one_hot(x.long(), space.n)
elif isinstance(space, MultiDiscrete):
return torch.cat(
[
nn.functional.one_hot(x[:, i].long(), n)
for i, n in enumerate(space.nvec)
],
dim=-1)
else:
raise ValueError("Unsupported space for `one_hot`: {}".format(space))
So MultiDiscrete should already included.
When you write:
actions = F.one_hot(input_dict["prev_n_actions"],2)
ā2ā stands for? Because action_space is MultiDiscrete([2,2]) so I think Ray code would hot encode it in 4 columns.
What do you think?
Thatās why the time dimension is fucked up. you would need
nn.functional.one_hot(x[:,:, i].long(), n)
The two is from num_classes=2. In theory you could also leave out the onehot step as the information is the same (sampled action, read below). It might still make a difference on how easy your model can learn it.
If you define your Multidiscrete space as input, yes rllib would map it to [B,4]. What I saw from other examples and the dimension seem to confirm it, is that in the view you always get the sampled action which in this case is [B,2] for one timestep.