[rllib] Custom Evaluation: No actions column in the SampleBatch. How can we access actions in evaluation?

burakdmb · July 2, 2023, 8:38pm

Hi everyone,

I’m trying to write a custom evaluation function. (By following the example codes in rllib/examples/custom_eval.py)

Firstly, I want to say that I’m very new to Ray and rllib. Therefore, if my question is meaningless or contextually wrong, I’m very sorry for taking your time.

I’m working on modifying the example code for custom evaluation rounds (lines 129-133 at ray/rllib/examples/custom_eval.py at ray-2.5.1 · ray-project/ray · GitHub).

My problem is, I want to access the actions of these evaluation batches, but as it can be seen below, SampleBatch does not have any key named ‘actions’.

for _ in range(5):
    eval_result_dict = eval_workers.local_worker().sample()
eval_sample_batch = eval_result_dict['default_policy']
print(eval_sample_batch)

And the print output:

SampleBatch(21: ['obs', 'new_obs', 'rewards', 'terminateds', 'truncateds', 'infos', 'eps_id', 'unroll_id', 'agent_index', 't', 'vf_preds', 'advantages', 'value_targets'])

After some debugging and tracing the code, I saw that in env_runner_v2.py, the code says that the actions column will be populated by StateBufferConnector. Normally, my SampleBatch objects include actions column as well but I’m not able to access them in the evaluation code. I also tried different environments, but I guess it is due to the StateBufferConnector and evaluation mode.

github.com

ray-project/ray/blob/ray-2.5.1/rllib/evaluation/env_runner_v2.py#L539-L540


      
          # Last action (SampleBatch.ACTIONS) column will be populated by
          # StateBufferConnector.

To summarize, how can we access the actions column in a custom evaluation function?

person · September 14, 2024, 11:57pm

I would also like to know how to access the actions column for evaluation. When using the on_postprocess_trajectory callback function, I get no actions column when this callback is executed during evaluation.

Topic		Replies	Views
Actions and observations by alphazero in evaluation RLlib	1	237	July 13, 2022
Evaluation of PPO agent fails due to wrongly shaped actions Configure Algorithm, Training, Evaluation, Scaling	2	57	October 8, 2024
RLLIB Evaluation on a batch of observations Configure Algorithm, Training, Evaluation, Scaling	1	254	December 11, 2023
Custom logging of agent behaviors RLlib	5	444	November 1, 2021
Extra step after environment is terminated Configure Algorithm, Training, Evaluation, Scaling	2	219	January 2, 2024

[rllib] Custom Evaluation: No actions column in the SampleBatch. How can we access actions in evaluation?

Related topics