Rllib and tune.run vs tune.Tuner serialization of trajectory data

Hi i am rather new to ray, so i am a bit lost with the following problem:

I want to track the environment steps (episodes actually, but i’d be fine with parsing them afterwards) from multiple hyperparameter configurations of an rllib agent. I already figured out, that i could do this using the tune.run interface, by specifying “output”:“logdir”.

This works nicely:


config = {
    "env": "CartPole-v1",
    "output": "logdir",  # saves the experiences to the agent log dir
    "output_compress_columns": [],  # Disable compression for certain columns
    "lr": tune.grid_search([0.01, 0.001, 0.0001])
}


result = tune.run(
    DQN,
    config=config
)

Then i figured, using the tune.Tuner would probably be more powerful in what i later try to achieve. Buuuuuut…


stopping_criteria = {"training_iteration": 10, "episode_reward_mean": 300}

tuner = tune.Tuner(
    trainable=DQN,
    tune_config=tune.TuneConfig(
        metric="episode_reward_mean",
        mode="max",
        num_samples=3,
    ),
    param_space={
        # first instantiate the default config and unpack it
        # fixme: callback
        **DQNConfig() \
            .training(sigma0=1.0) \
            .environment(env="CartPole-v1") \
            .rollouts(num_rollout_workers=1) \
            .resources(num_gpus=0) \
            .training(replay_buffer_config={
            "type": "MultiAgentPrioritizedReplayBuffer",
            "capacity": 60000,
            "prioritized_replay_alpha": 0.5,
            "prioritized_replay_beta": 0.5,
            "prioritized_replay_eps": 3e-6}) \
            .to_dict(),

        # override it with the values you'd like to have.
        "n_step": 3,
        'output': 'logdir',  # Fixme: how to write out / parse the obs not in binary?

        # These params start off randomly drawn from a set.
        "lr": tune.loguniform(1e-4, 1e-1),

    },
    run_config=train.RunConfig(stop=stopping_criteria),
)
results = tuner.fit()

while this actually runs and will produce the same json files with the trajectory data, i wonder why the “obs” in those jsons suddenly are serialized. I tried already multiple ways of unserializing it, but can’t find a) the reason why it is serialized in the Tuner (but not the tune.run) in the first place and b) how i could unserialize it if there is no way around it.

To summarize (and maybe there is a smarter way of doing this in the first place):
I want to collect an offline dataset of the environment steps of agents configured with multiple hyperparameter configurations.

Thanks a lot for your help!

turns out: 'output': 'logdir', needs "output_compress_columns": [], to use uncompressed observations, because the default amounts to ['obs', 'new_obs']

I just took the config from another file for run and didn’t give that argument much thought, because i didn’t have a proper documentation for it