Processing additional observations in real time

Hi all, hope that you can help :slight_smile:
TL;DR: How technically can I process, at the same “forward” pass both observations from the environment and some other hard-coded observations to compute some intrinsic loss (but using the same model for both passes).

I’m using a custom A3C model, and I want to hard-coded add the processing of other observations and calculate some intrinsic loss and the insert this value to the general loss (s.t I’ll have the gradients work for the actual observed states and also my hard coded ones).

It seems that I have everything working (with dummy variables) except:
When I call in “forward” to “compute_intrinsic_reward” which in turn take some observations and call to self.actions_model.forward_rnn(…) the run gets stuck until it runs out of memory and I don’t know where exactly it breaks.

Do you have any idea? I guess hardcoded calling to the main model in “forward” mess things up but I don’t know how.

Using TF, and the “actions_model” inherits from RecurrentTFModelV2.

I think that even just explaining if and how can I run (during “forward”) some hard coded observations into “forward_rnn” would be very helpful!

anyone? just any eplanation that comes to mind why I’d get the behavior of an infinite loop?

Hey @Ofir_Abu , not sure I understand exactly what you have in mind, but it seems like you could configure a custom “callbacks” (sub-class ray.rllib.agents.callbacks.DefaultCallbacks and override the on_postprocess_trajectory method. In there, you can add whatever data you would like to your train_batch and then use that extra data in your model’s forward passes. Or you could do the forward pass through your intrinsic reward model already in on_postprocess_trajectory and modify the “rewards” in the batch, so that your loss function already sees the modified rewards.

Does this make sense?

An example for intrinsic rewards in RLlib is its Curiosity exploration module:; Also see this test case here: