How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
As much a conceptual as a technical question, but can we do parameter noise exploration with any of the policy gradient or actor critic methods? I have a setting where action noise just really doesn’t work well, but I’d also really need stochastic policies so DQN isn’t an option. I note that the original parameter noise paper was using a policy gradient approach, but I can’t find much on it more recently. This question was asked here before but was just pointed to DQN and DDPG. So I’m curious, is this something that can be done in rllib with e.g. PPO or any other policy gradient or actor critic approach?