1. Severity of the issue:
High: Completely blocks me.
2. Environment:
- Ray version: 2.44.1
- Python version: 3.12.9
- OS: Linux
3. What happened vs. what you expected:
Saved tabular offline data generated by a pretrained PPO agent in .parquet files.
- Expected that loading the data from parquet would give me something like this
# Column Type
# ------ ----
# eps_id string
# agent_id null
# module_id null
# obs numpy.ndarray(shape=(4,), dtype=float)
# actions int32
# rewards double
# new_obs numpy.ndarray(shape=(4,), dtype=float)
# terminateds bool
# truncateds bool
# action_dist_inputs numpy.ndarray(shape=(2,), dtype=float)
# action_logp float
# weights_seq_no int64
similar as in the tutorial for offline data under (Working with offline data — Ray 2.44.1).
- Actual result after loading looks like this:
# ------ ----
# eps_id string
# agent_id null
# module_id null
# obs string
# actions int32
# rewards double
# new_obs string
# terminateds bool
# truncateds bool
# action_dist_inputs numpy.ndarray(shape=(2,), dtype=float)
# action_logp float
# weights_seq_no int64
I have a pytorch model that I want to train with this offline RL data. However, like this I can’t use this data any further, because the obs and new_obs still seem to be serialized and not numpy arrays as expected. I tried to find out how to deserialize the string to numpy myself, but I sadly wasn’t successful at that.
4. Source code:
Saving tabular data:
from ray.rllib.core.rl_module.default_model_config import DefaultModelConfig
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.core import (
COMPONENT_RL_MODULE,
)
from ray.rllib.core.rl_module import RLModuleSpec
# Set up a path for the tabular data records.
output_write_episodes = False
data_path = "tmp/offline_episodes/" if output_write_episodes else "tmp/offline_tabular/"
# Configure the algorithm for recording.
config = (
PPOConfig()
# The environment needs to be specified.
.environment(
env="CartPole-v1",
)
# Make sure to sample complete episodes because
# you want to record RLlib's episode objects.
.env_runners(
batch_mode="complete_episodes",
)
# Set up 5 evaluation `EnvRunners` for recording.
# Sample 50 episodes in each evaluation rollout.
.evaluation(
evaluation_num_env_runners=5,
evaluation_duration=50,
)
# Use the checkpointed expert policy from the preceding PPO training.
# Note, we have to use the same `model_config` as
# the one with which the expert policy was trained, otherwise
# the module state can't be loaded.
.rl_module(
model_config=DefaultModelConfig(
fcnet_hiddens=[32],
fcnet_activation="linear",
# Share encoder layers between value network
# and policy.
vf_share_layers=True,
),
)
# Define the output path and format. In this example you
# want to store data directly in RLlib's episode objects.
.offline_data(
# You want to store for this example tabular data.
output_write_episodes=output_write_episodes,
output=data_path,
)
)
#this is a checkpoint of a pretrained PPO agent with average episode length 500
best_checkpoint = '/ray_results/docs_rllib_offline_pretrain_ppo/PPO_CartPole-v1_36f51_00000_0_2025-02-23_18-25-45/checkpoint_000112'
# Build the algorithm.
algo = config.build()
# Load the PPO-trained `RLModule` to use in recording.
algo.restore_from_path(
best_checkpoint,
# Load only the `RLModule` component here.
component=COMPONENT_RL_MODULE,
)
# Run 10 evaluation iterations and record the data.
for i in range(100):
print(f"Iteration {i + 1}")
if i%10 == 0:
res_eval = algo.evaluate()
print(res_eval)
# Stop the algorithm. Note, this is important for when
# defining `output_max_rows_per_file`. Otherwise,
# remaining episodes in the `EnvRunner`s buffer isn't written to disk.
algo.stop()
Loading the .parquet file:
from ray import data
tabular_data_path = "tmp/offline_tabular/"
# Read the tabular data into a Ray dataset.
ds = data.read_parquet(tabular_data_path)
# Now, print its schema.
print("Tabular data schema of expert experiences:\n")
print(ds.schema())