Rllib and tune.run vs tune.Tuner serialization of trajectory data

Tim_Ruhkopf · March 1, 2024, 9:15am

Hi i am rather new to ray, so i am a bit lost with the following problem:

I want to track the environment steps (episodes actually, but i’d be fine with parsing them afterwards) from multiple hyperparameter configurations of an rllib agent. I already figured out, that i could do this using the tune.run interface, by specifying “output”:“logdir”.

This works nicely:


config = {
    "env": "CartPole-v1",
    "output": "logdir",  # saves the experiences to the agent log dir
    "output_compress_columns": [],  # Disable compression for certain columns
    "lr": tune.grid_search([0.01, 0.001, 0.0001])
}


result = tune.run(
    DQN,
    config=config
)

Then i figured, using the tune.Tuner would probably be more powerful in what i later try to achieve. Buuuuuut…


stopping_criteria = {"training_iteration": 10, "episode_reward_mean": 300}

tuner = tune.Tuner(
    trainable=DQN,
    tune_config=tune.TuneConfig(
        metric="episode_reward_mean",
        mode="max",
        num_samples=3,
    ),
    param_space={
        # first instantiate the default config and unpack it
        # fixme: callback
        **DQNConfig() \
            .training(sigma0=1.0) \
            .environment(env="CartPole-v1") \
            .rollouts(num_rollout_workers=1) \
            .resources(num_gpus=0) \
            .training(replay_buffer_config={
            "type": "MultiAgentPrioritizedReplayBuffer",
            "capacity": 60000,
            "prioritized_replay_alpha": 0.5,
            "prioritized_replay_beta": 0.5,
            "prioritized_replay_eps": 3e-6}) \
            .to_dict(),

        # override it with the values you'd like to have.
        "n_step": 3,
        'output': 'logdir',  # Fixme: how to write out / parse the obs not in binary?

        # These params start off randomly drawn from a set.
        "lr": tune.loguniform(1e-4, 1e-1),

    },
    run_config=train.RunConfig(stop=stopping_criteria),
)
results = tuner.fit()

while this actually runs and will produce the same json files with the trajectory data, i wonder why the “obs” in those jsons suddenly are serialized. I tried already multiple ways of unserializing it, but can’t find a) the reason why it is serialized in the Tuner (but not the tune.run) in the first place and b) how i could unserialize it if there is no way around it.

To summarize (and maybe there is a smarter way of doing this in the first place):
I want to collect an offline dataset of the environment steps of agents configured with multiple hyperparameter configurations.

Thanks a lot for your help!

Tim_Ruhkopf · March 1, 2024, 10:55am

turns out: 'output': 'logdir', needs "output_compress_columns": [], to use uncompressed observations, because the default amounts to ['obs', 'new_obs']

I just took the config from another file for run and didn’t give that argument much thought, because i didn’t have a proper documentation for it

Topic		Replies	Views
Why do my tune runs have the same outputs across all iterations?	6	553	March 8, 2023
Empty checkpoint files with Tune.run RLlib	1	385	March 30, 2022
Some questions about tune	0	376	April 19, 2023
Possible to access default logger from environment? RLlib	15	1440	April 27, 2021
[rllib]Help!How can cartpole_client.py and cartpole_server.py use tune to set up distributed enviroment? RLlib	1	195	December 13, 2020

Rllib and tune.run vs tune.Tuner serialization of trajectory data

Related topics