Processing additional observations in real time

Ofir_Abu · January 3, 2022, 10:16pm

Hi all, hope that you can help
TL;DR: How technically can I process, at the same “forward” pass both observations from the environment and some other hard-coded observations to compute some intrinsic loss (but using the same model for both passes).

I’m using a custom A3C model, and I want to hard-coded add the processing of other observations and calculate some intrinsic loss and the insert this value to the general loss (s.t I’ll have the gradients work for the actual observed states and also my hard coded ones).

It seems that I have everything working (with dummy variables) except:
When I call in “forward” to “compute_intrinsic_reward” which in turn take some observations and call to self.actions_model.forward_rnn(…) the run gets stuck until it runs out of memory and I don’t know where exactly it breaks.

Do you have any idea? I guess hardcoded calling to the main model in “forward” mess things up but I don’t know how.

Using TF, and the “actions_model” inherits from RecurrentTFModelV2.

EDIT:
I think that even just explaining if and how can I run (during “forward”) some hard coded observations into “forward_rnn” would be very helpful!

Ofir_Abu · January 8, 2022, 9:28am

anyone? just any eplanation that comes to mind why I’d get the behavior of an infinite loop?

sven1977 · January 12, 2022, 2:48pm

Hey @Ofir_Abu , not sure I understand exactly what you have in mind, but it seems like you could configure a custom “callbacks” (sub-class ray.rllib.agents.callbacks.DefaultCallbacks and override the on_postprocess_trajectory method. In there, you can add whatever data you would like to your train_batch and then use that extra data in your model’s forward passes. Or you could do the forward pass through your intrinsic reward model already in on_postprocess_trajectory and modify the “rewards” in the batch, so that your loss function already sees the modified rewards.

Does this make sense?

An example for intrinsic rewards in RLlib is its Curiosity exploration module: ray.rllib.utils.exploration.curiosity.py; Also see this test case here: ray.rllib.utils.exploration.tests.test_curiosity.py

Topic		Replies	Views
Backdating rewards with PolicyClient RLlib	2	364	December 25, 2022
Question related to inference in RLlib RLlib	5	815	May 13, 2021
[RLlib] Curiosity Exploration Clarification RLlib	2	556	October 5, 2021
Impala Deep Residual (Custom) Model RLlib	2	393	November 23, 2022
Question about internal states to the environment RLlib	2	368	October 4, 2021

Processing additional observations in real time

Related topics