Hello, I was wanting to try something out, I want to “Jump-Start” my agent as seen here: [2204.02372] Jump-Start Reinforcement Learning The way they go about it is, some of the time they inject their own actions from another source other then the model. Therefore you would store the action you in…

Therefore I should change my code to this? means = self._logits(self._features) if self._logits else self._features clamped_log_stds = torch.clamp(self.log_stds, -1.0, 1.0) clamped_log_stds = clamped_log_stds.unsqueeze(0).expand_as(means) logits = torch.cat([means, c…

Hey @Samuel_Fipps I was seeing that you were looking at the clamping of the log_stds like @mannyv has been pointing to for, what seems like, years now lol. So, the actual clamping range of the log_stds depending on your action space. In the example that was given above from the PPO NAN logits, I un…

I tried to do this and it greatly slowed my training. I’m guessing gradient clipping wouldn’t work for this issue? However yes, this is my issue. I am getting the PPO NAN logits.

@Samuel_Fipps , Do you have a sense of what is making it slow? I would not expect that the clamping and shape wrangling would have that large an effect. I suppose if a lot of values are being clamped that could cause a slow down be cause clamped values will not contribute to learning since the do no…

Just had an idea. Another option you have rather than clamping is to apply a threshold operator and use a straight-through estimator. Riffing on @tlaurie99 ’s example, it would be something like this: logits, _ = self.actor_fcnet(input_dict, state, seq_lens) means, log_stds = torch.chunk(logits, 2,…

@Samuel_Fipps , Oh, I thought that was way you were saying was slow. Yeah you can give those a try.

Well I did it myself following @tlaurie99 example. I am hoping that some how I messed it up or just had something go wrong with my run, and that using the built in method will work. Thanks for all the help guys!

So I enabled the 2 options instead of doing them myself {"log_std_clip_param": 1} {"free_log_std": True} It didn’t seem to slow down the training this time (as in how long it takes to process the data), but my model just didn’t learn anything now. update: I commented out “{“free_log_std”: True}” …

Hey @Samuel_Fipps , that seems to be a good strategy. Setting {"free_log_std": True} will have the log_stds as a parameter of the model and just like you said I found that the agent normally doesn’t learn anything. As you go down in the log_std_clip_param you’ll find a spot, hopefully, where the agen…

Jump-Start Reinforcement Learning

RLlib

Samuel_Fipps January 7, 2025, 2:49pm 18

Can I just use the 2 options that are talked about here?

PPO nan in actor logits - #6 by tlaurie99.

Topic		Replies	Views
PPO nan in actor logits RLlib	7	1005	October 1, 2024
Error: nan Tensors in PyTorch with Ray RLlib for MARL RLlib	12	1351	August 10, 2024
ValueError: Expected parameter logits (...) to satisfy the constraint IndependentConstraint(Real(), 1) RLlib	38	9539	October 14, 2024
TrajectoryTracking with RLLIB RLlib	14	1452	November 17, 2021
Unable to replicate original PPO performance RLlib	0	221	May 10, 2024

Jump-Start Reinforcement Learning

Related topics