We would like to evaluate out RL model on a batch of pre-saved observations (as opposed to the natural walk of the agent through the environment). We expect that the function algo.compute_actions() should allow doing that but we cannot find the right representation of the input data that works with the function. Optimally, we would provide the function with batches of observations in a form of either: PD DataFrames, NP Arrays, Python Dictionaries, Lists, … Below I am pasting the minimal code that recreates the issue with data representations tested so far.
I am also listing the errors we are getting with each of the input versions:
Thx in advance for help on it
“”"
! pip install ray
! pip install gymnasium
! pip install dm_tree
! pip install tensorflow
! pip install tensorflow-probability
#“”"
import numpy as np
from ray.rllib.algorithms.ppo import PPOConfig
algo = (
PPOConfig()
.environment(“CartPole-v1”)
.framework(“tf2”)
.rollouts(num_rollout_workers=0)
.build()
)
for i in range(10):
print(i)
print(“algo.compute_single_action() SINGLE”,algo.compute_single_action({“obs”: np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))
#"""
# DOES NOT BREAK BUT RETURNS ONLY A SINGLE ACTION
print("algo.compute_single_action() MANY",algo.compute_single_action({"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))
"""
# SpecCheckingError: input spec validation failed on TfMLPEncoder.call, Mismatch found in data element ('obs',),
# which is a TensorSpec: Expected data type <class 'tensorflow.python.framework.tensor.Tensor'> but found NestedDict.
print("algo.compute_single_action()",algo.compute_single_action({"obs": [np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])]}))
"""
#AttributeError: 'NoneType' object has no attribute 'transform'
print("algo.compute_actions()", algo.compute_actions({"obs": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))
#print("algo.compute_single_action()",algo.compute_actions({"observations": np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])}))
#print("algo.compute_single_action()",algo.compute_actions({"observations": [np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]}))
#print("algo.compute_single_action()",algo.compute_actions({"observations": [[np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]]}))
#AttributeError: 'list' object has no attribute 'items'
#print("algo.compute_single_action()",algo.compute_actions([[np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]]))
#"""
"""
print("algo.compute_actions()", algo.compute_actions({"observations": [np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]),
np.array([np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()])]}))
#"""
"""
print("algo.compute_single_action()",algo.compute_actions({"observations": np.array([[np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()],
[np.random.rand(), np.random.rand(), np.random.rand(), np.random.rand()]])}))
#"""