RLLIB Evaluation on a batch of observations

We would like to evaluate out RL model on a batch of pre-saved observations (as opposed to the natural walk of the agent through the environment). We expect that the function algo.compute_actions() should allow doing that but we cannot find the right representation of the input data that works with the function. Optimally, we would provide the function with batches of observations in a form of either: PD DataFrames, NP Arrays, Python Dictionaries, Lists, … Below I am pasting the minimal code that recreates the issue with data representations tested so far.

I am also listing the errors we are getting with each of the input versions:

Thx in advance for help on it

“”"
! pip install ray
! pip install gymnasium
! pip install dm_tree
! pip install tensorflow
! pip install tensorflow-probability
#“”"

import numpy as np
from ray.rllib.algorithms.ppo import PPOConfig

algo = (
PPOConfig()
.environment(“CartPole-v1”)
.framework(“tf2”)
.rollouts(num_rollout_workers=0)
.build()
)

for i in range(10):
print(i)
print(“algo.compute_single_action() SINGLE”,algo.compute_single_action({“obs”: np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))

#"""
# DOES NOT BREAK BUT RETURNS ONLY A SINGLE ACTION
print("algo.compute_single_action() MANY",algo.compute_single_action({"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))


"""
# SpecCheckingError: input spec validation failed on TfMLPEncoder.call, Mismatch found in data element ('obs',), 
# which is a TensorSpec: Expected data type <class 'tensorflow.python.framework.tensor.Tensor'> but found NestedDict.
print("algo.compute_single_action()",algo.compute_single_action({"obs": [np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                                 np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])]}))
"""
#AttributeError: 'NoneType' object has no attribute 'transform'
print("algo.compute_actions()", algo.compute_actions({"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))
#print("algo.compute_single_action()",algo.compute_actions({"observations": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))
#print("algo.compute_single_action()",algo.compute_actions({"observations": [np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]}))
#print("algo.compute_single_action()",algo.compute_actions({"observations": [[np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]]}))

#AttributeError: 'list' object has no attribute 'items'
#print("algo.compute_single_action()",algo.compute_actions([[np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]]))
#"""

"""
print("algo.compute_actions()", algo.compute_actions({"observations": [np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                       np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])]}))
#"""

"""
print("algo.compute_single_action()",algo.compute_actions({"observations": np.array([[np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()],
                                                                                           [np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]])}))
#"""

Further evaluation showed that the issue is not with the representation of the observations at the input but with lack of preprocessor for the default policy (the NoneType object in the error message is the lacking preprocessor) …

It works once I modify the File ~\AppData\Local\anaconda3\Lib\site-packages\ray\rllib\algorithms\algorithm.py:1750

with:

    policy = self.get_policy(policy_id)

    filtered_obs, filtered_state = [], []
    for agent_id, ob in observations.items():
        worker = self.workers.local_worker()
        
        #SEBTAC MOdification:
        #preprocessed = worker.preprocessors[policy_id].transform(ob)
        preprocessed = ob

Can you explain why that is need and can we make this step optional of such preprocessor is not available/needed?

Best,

Sebastian