Save played trajectories in memory

Quoding · August 17, 2022, 8:25pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi,
I have recently picked up RLLib, though I have a background in RL (specifically bandits). I’m trying to find an elegant way to save trajectories (or, more specifically, observations) observed by my agent in memory (not disk).

I’ve read this post: How to save PPO trajectory and train at a later time that hits pretty close to my target, however I am wondering if there is a way to save these trajectories in memory to avoid extensive I/O to the disk. I understand that callbacks seems to be the way to do this, however, it seems like the only way to do this by using the dictionaries in their “intended” way would be to use the custom_metrics dictionary and just return raw custom metrics by using the configuration key "keep_per_episode_custom_metrics" (this would be a duct tape patch if it did work, imo).

Alternatively, I could save the states in the environment itself, but having read through the documentation for 2 days now, I can’t find a way to access a RolloutWorker’s attributes / environment object.

Quoding · August 17, 2022, 8:45pm

I’ve read further the RolloutWorker’s methods and found that I can call foreach_env on them.

Here’s how it works, for posterity:

In your environment, make a list that stockpiles every observed states. (In environment class) Ex :

def record_state(self):
    self.all_observed_states.append(self.current_state.clone())

Write a getter for said list. (In environment class) Ex :

def get_obs_states(self):
    return self.all_observed_states

In your main script, write a function that takes as input an environment and calls the env’s getter for the list. Ex:

fn = lambda env: env.get_obs_states()

Call this:

observations = [
      ray.get(worker.foreach_env.remote(fn))
      for worker in trainer.workers.remote_workers()
]

observations will hold lists of all workers’ seen states during sampling.

I don’t think this is “great” but I guess it works.

Will mark as solution unless someone finds something better 24 hours from the time of this post
(2022-08-17T04:00:00Z).

Topic		Replies	Views
How to save PPO trajectory and train at a later time RLlib	6	826	March 29, 2021
Persisting values across callbacks RLlib	2	609	May 8, 2023
Saving episode trajectories during training RLlib	0	221	July 13, 2023
Extracting and storing per step agent state from RLlib rollouts RLlib	3	316	July 23, 2021
Extracting episode information RLlib	0	95	March 18, 2024

Save played trajectories in memory

Related topics