Accessing info dicts in postprocessing callback

cwerner · December 3, 2020, 12:18am

Hello,

I recently updated from rllib 0.8.6 to rllib 1.0.1 and also converted from TF to Torch. Previously, I was using the info dict from my custom environment to pass along values that I used in the postprocessing callback. However, now I don’t see the info key when I check the keys on the original batch (using original_batches[agent_id]) in my on_postprocess_trajectory callback. I only see :

(pid=17353) dict_keys([‘obs’, ‘actions’, ‘rewards’, ‘dones’, ‘eps_id’, ‘agent_index’, ‘new_obs’, ‘action_dist_inputs’, ‘action_logp’, ‘vf_preds’, ‘unroll_id’, ‘advantages’, ‘value_targets’])

I tried changing the view_requirements function to include SampleBatch.INFOS but this did not work. How can I go about seeing the info dicts in this callback?

sven1977 · December 13, 2020, 10:28am

Thanks for asking this here @cwerner !
There is indeed a bug in 1.0.1, described (and fixed) here: https://github.com/ray-project/ray/issues/12509
The workaround for this version is to set _use_trajectory_view_api to False in your Trainer’s config.
This has been fixed in subsequent versions (master and upcoming 1.2.x).

roireshef · December 17, 2020, 11:30pm

Hi @sven1977,

Any chance you could show an example of how to add a field from the info dict to the trajectory view api, rather than disabling it? I’m actually facing the same use case.

Thanks,
Roi

sven1977 · December 19, 2020, 1:54pm

@roireshef, let me try to get this to work w/o disabling it. …

roireshef · December 19, 2020, 9:19pm

Thanks, looking forward to it. I’ll be following this thread.

If I understood correctly, the trajectory view api is reducing memory footprint by avoiding streaming all the information through the entire pipeline. Please correct me if I’m wrong, I assumed it’s generally better to use it vs not to use it, since it should probably make things run faster.

It’d be nice to be able to customize it to include only those additional fields we want access to. I’m not sure how easy is it to do so, but I just couldn’t find any documentation on where to start from. If you have any example on how to add a field to it, it should be enough for me…

Thanks,
Roi

sven1977 · December 20, 2020, 9:52am

It’s actually quite hackish to do this on top of 1.0.1 (several files need to be changed b/c of bugs).
The recommended way is to use the current master (or upcoming 1.2.x), where this has been fixed.
You will get env infos automatically in your loss or postprocessing function (if these functions need this field, i.e. access it in a test pass).

Documentation is in-flight (doc PR is in review).

Yes, the speedup on Atari for PPO was ~20%. For more “learn-heavy” algos (lots of updates vs action inference) like DQN or SAC, it’s not really faster, but definitely not slower either.

roireshef · December 20, 2020, 2:50pm

Hi @sven1977 I am using current master with 1.2.0.dev0 wheel from few days ago. It’s only with “_use_trajectory_view_api” set to False that I see the info dict appears in postprocess_trajectory function.

I’m on commit 407a3523f367a1e2f124b4bdfb1aef3a2d4340a7 from today.

sven1977 · December 22, 2020, 12:35pm

In the initial call to the loss function (during Policy setup), you should see an all-0s train_batch being passed into the loss function (including all possible SampleBatch columns).
Then, if you access some column in your loss function, RLlib will detect this and provide that column in all subsequent calls.

So I just tried this:

set breakpoint into PPOTorchPolicy’s loss function.
run the rllib/agents/ppo/tests/test_ppo.py::test_ppo_compilation_and_lr_schedule test case with ray.init(local_mode=True)
For the initial test call to the loss, I see “infos” in train_batch properly initialized with 0s.
Then, if I access this column in the loss function to tell RLlib that “infos” are needed (e.g. by printing train_batch[“infos”]), I do see this column also in all subsequent loss calls.

sven1977 · December 22, 2020, 12:39pm

Ah yes, I do see another error in torch now that has to do with the attempted conversion of the TrackingDict (from {} to a torch tensor, which fails).
Will fix this right now. …

I was wondering how QMIX works (it does access the “infos” column for each train setp), but it completely overrides learn_on_batch, so there is no tracking dict!

sven1977 · December 22, 2020, 2:33pm

PR with a fix: https://github.com/ray-project/ray/pull/13039

janblumenkamp · January 11, 2021, 1:37am

I am noticing a similar issue with the current master. Sven, can you elaborate on how exactly the auto detection of required fields is intended to work? In the postprocessing function, I am currently extracting rewards from the info dict and those rewards are then processed to advantages. If the info dict is populated with 0 in the first pass, I cannot compute these advantages, and if I don’t compute the advantages, the loss function will not be able to do anything meaningful. Do I need a separate handling of the initial pass to the postprocessing/loss function that detects if the info dict is populated with 0 or with an actual dict?

Topic		Replies	Views
Info dict keys and then add them as new entries of SampleBatch RLlib	5	829	November 26, 2021
RLlib Batch Postprocessing has steps from other trajectories RLlib	5	361	April 22, 2024
[rllib] Dict Action Space and Custom Model RLlib	5	2449	March 30, 2021
'infos' automatically stripped if they are accessed in mixin RLlib	2	343	February 26, 2021
'infos' in view requirement replaced with zeros in dummy batch RLlib	0	307	March 4, 2021

Accessing info dicts in postprocessing callback

Related topics