RLLIB Evaluation on a batch of observations

sebtac · December 7, 2023, 8:10pm

We would like to evaluate out RL model on a batch of pre-saved observations (as opposed to the natural walk of the agent through the environment). We expect that the function algo.compute_actions() should allow doing that but we cannot find the right representation of the input data that works with the function. Optimally, we would provide the function with batches of observations in a form of either: PD DataFrames, NP Arrays, Python Dictionaries, Lists, … Below I am pasting the minimal code that recreates the issue with data representations tested so far.

I am also listing the errors we are getting with each of the input versions:

Thx in advance for help on it

“”"
! pip install ray
! pip install gymnasium
! pip install dm_tree
! pip install tensorflow
! pip install tensorflow-probability
#“”"

import numpy as np
from ray.rllib.algorithms.ppo import PPOConfig

algo = (
PPOConfig()
.environment(“CartPole-v1”)
.framework(“tf2”)
.rollouts(num_rollout_workers=0)
.build()
)

for i in range(10):
print(i)
print(“algo.compute_single_action() SINGLE”,algo.compute_single_action({“obs”: np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))

#"""
# DOES NOT BREAK BUT RETURNS ONLY A SINGLE ACTION
print("algo.compute_single_action() MANY",algo.compute_single_action({"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                "obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))


"""
# SpecCheckingError: input spec validation failed on TfMLPEncoder.call, Mismatch found in data element ('obs',), 
# which is a TensorSpec: Expected data type <class 'tensorflow.python.framework.tensor.Tensor'> but found NestedDict.
print("algo.compute_single_action()",algo.compute_single_action({"obs": [np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                                 np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])]}))
"""
#AttributeError: 'NoneType' object has no attribute 'transform'
print("algo.compute_actions()", algo.compute_actions({"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))
#print("algo.compute_single_action()",algo.compute_actions({"observations": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))
#print("algo.compute_single_action()",algo.compute_actions({"observations": [np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]}))
#print("algo.compute_single_action()",algo.compute_actions({"observations": [[np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]]}))

#AttributeError: 'list' object has no attribute 'items'
#print("algo.compute_single_action()",algo.compute_actions([[np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]]))
#"""

"""
print("algo.compute_actions()", algo.compute_actions({"observations": [np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
                                                                       np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])]}))
#"""

"""
print("algo.compute_single_action()",algo.compute_actions({"observations": np.array([[np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()],
                                                                                           [np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]])}))
#"""

sebtac · December 11, 2023, 5:16pm

Further evaluation showed that the issue is not with the representation of the observations at the input but with lack of preprocessor for the default policy (the NoneType object in the error message is the lacking preprocessor) …

It works once I modify the File ~\AppData\Local\anaconda3\Lib\site-packages\ray\rllib\algorithms\algorithm.py:1750

with:

    policy = self.get_policy(policy_id)

    filtered_obs, filtered_state = [], []
    for agent_id, ob in observations.items():
        worker = self.workers.local_worker()
        
        #SEBTAC MOdification:
        #preprocessed = worker.preprocessors[policy_id].transform(ob)
        preprocessed = ob

Can you explain why that is need and can we make this step optional of such preprocessor is not available/needed?

Best,

Sebastian

Topic		Replies	Views
Preprocessor error on batches of observations RLlib	4	710	February 7, 2023
About compute_single_action after training atari breakout Configure Algorithm, Training, Evaluation, Scaling	1	440	January 5, 2023
Compute actions Programmatically RLlib	1	285	February 5, 2022
Model doesn't recognize ObservationWrapper and keeps using orig_observation RLlib	4	342	October 7, 2022
[Rllib] compute_single_action() with an LSTM-PPO trainer fails RLlib	1	983	February 3, 2023

RLLIB Evaluation on a batch of observations

Related topics