Rllib checkpointing environment in Tune

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Problem:
I am using RLLib and Tune together and my environment (Gym environment) keeps track of a certain state. Say for simplicity the number of episodes. But actually a more complicated state.

Now if I stop the experiment before it is finished and resume it I want to make sure all environments that are created are set to the state at which the experiment was stopped. Say if one environment was at episode 2 and another at 3 they will load a variable self.episodes = 2 and self.episodes = 3 or something similar.

So that if I resume I get exactly the same results (given that I have set a seed) then if I wouldn’t resume and let my experiment run. But since the environments are initialized in their begin state the results are actually different.

I have looked at the tune.checkpoint function API
https://docs.ray.io/en/latest/tune/api_docs/trainable.html#tune-function-docstring
But here I somehow need a function with an argument checkpoint_dir which I don’t know how I can have access to in the environment.

Before I used Ray my environment had a set_state function that gets called when a checkpoint is saved and a get_state function that gets called when there is resumed. And this is basically the functionality I want.

Let me know if this is possible.

Edit: To clarify a bit more. Consider the scenario where I have two runs, run_1 and run_2 with Tune the checkpoints get stored in a format name_of_grid_search_experiment/run_1 and name_of_grid_search_experiment/run_2. Now if I stop the experiment then for example run_1 is by episode 45 and run_2 by episode 46. I want to save those numbers so that if I resume the environment of run_1 knows it is by episode 45 and run_2 by episode 46. How to do this?

I cannot simply save a certain checkpoint because I need to store it in the checkpoint that Tune already generates, the checkpoint_00001 folder etc. I want to put it in such a folder. But the environment doesn’t know the directory and file name of the currently saved checkpoint and when loading (resuming) a model it also doesn’t know which folder to look for.

2 Likes

without modifying RLlib’s source code, I guess you can use a custom env creator function to craete your env. you then would know which worker’s which env are being re-created.
and as checkpointing, can you just write your env state to a well known cloud or network storage location? you don’t have to save this info as part of the checkpoint.
and you can read your own env checkpoints in your env creator function.

if you are up to changing RLlib’s source code, you can definitely add some code to get the states of the env and make it part of the worker states.