Change or Generate offline data

I want to modify the offline data with my actions, but I see that the observation from json is encoded (in what format?), also in the json file are many actions and previous rewards, do I need all those if I want to modify or generate new offline data?

I want to create new offline data with one action for each observation, is that possible?

It will be much easier if the action can be changed on the fly in the simulation with offline data.

  • High: It blocks me to complete my task.

The columns of your offline data depend on the algorithm you are using. But at the time of writing, postprocessing of data is done in the sample collection stage of our data flow. That means that if your RolloutWorkers sample from the environment, they will also call the postprocess_trajectory() method of the policy that is used for collection. So upon collection of experiences from RolloutWorkers, you should already be supplied with the necessary columns.

Have a look at the custom rollout worker workflow example if you want to find a good place to start your data collection routine.

The observations in the .json files that are included with RLlib are not encoded afaics. Can you please reference what you are writing about?

1 Like

Thanks, I will take a look. I use APPO because it is the fastest and I see that is recommended in many places in the rllib documentation.

This is how the .json file observation looks like
{“type”: “SampleBatch”, “obs”: "BCJNGEhAinECAAAAAAB4AAABgIAFlSYAAAAAAAAAjBJudW1weS

how to convert it to float ?

Please provide a script of how you create this.

import gym
import numpy as np
import ray
from ray import tune
from ray.rllib.agents import ppo
from ray.tune.logger import pretty_print

class MyEnv(gym.Env):
    def __init__(self, config=None):
        super(MyEnv, self).__init__()        

        self.timestep = 0
        self.action_space = gym.spaces.Box(
            low=-1, high=1, shape=(2,), dtype=np.float32)
        self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(400,), dtype=np.float32)
    def _next_observation(self):
      obs = np.random.rand(400)
      return obs

    def _take_action(self, action):
      self._reward = 1      

    def step(self, action):        
        # Execute one time step within the environment
        self._reward = 0
        done = False        
        obs = self._next_observation()

        if self.timestep>99: done=True

        return obs, self._reward, done, {}

    def reset(self):
        self._reward = 0
        self.total_reward = 0       
        self.visualization = None
        return self._next_observation()

if __name__ == "__main__":

    config = {
        "env": MyEnv,        
        "framework": "torch",
        "num_gpus": 1,
        "horizon": 100,        
        "num_workers": 2,
        "num_envs_per_worker": 2,
        "lr": 0.00001,
        "output": "TestOut",
        "output_max_file_size": 5000000,

    stop = {
        "timesteps_total": 10

    tune_flag = False
    if tune_flag:
        results ="PPO", config=config, stop=stop)
          agent = ppo.APPOTrainer(config=config, env=MyEnv)
          for _ in range(100):
            result = agent.train()


In the config there is a key with these defaults:

   "output_compress_columns": ["obs", "new_obs"],

If you change that to an empty list it will not compress them. The compression used is LZ4.

You can find the documentation here:


The actual compression code is here:

1 Like

@mannyv Thanks!
And if I want to use the compressed data and decompress it I assume I have to do a base64 decoding and then lz4.frame.decompress ?


You can just use