I’ve been working on a project that requires passing the internal state of some agents to the environment for finding rewards. I believe I can’t directly send it in the form of action because this will let the gradient flow through the internal states. Is there a way to pass states to the environment? Or is there a way to disable gradient for one part of the action? What would be an alternative to this? Thank you.
I am not sure I followed you about the gradient flowing through the internal states part. There are no gradients during the sample phase of the execution plan only the learning portion.
I do not know a nice clean way to do this in rllib. Maybe someone else will have a better approach.
I would probably try to implement this using a custom callback. The on_episode_step method has access to the worker and the environments. The worker has access to the policies and they in turn have the model.
What I would do is create a custom model that stores whatever values I wanted to communicate as a member variable in the model then in the calback I would take that info from the model and put it on the environment.
Keep in mind that on episode step is called after an action is taken. So you would be using the callback to add info to the environment for step t+1.
Got it. This seems complicated though. Just curious, is there any chance that rllib will support some type of fixed-size extra info passed from the agents to the env? It just needs to be handled the same way that actions are handled, but without the learning part.