Save played trajectories in memory

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi,
I have recently picked up RLLib, though I have a background in RL (specifically bandits). I’m trying to find an elegant way to save trajectories (or, more specifically, observations) observed by my agent in memory (not disk).

I’ve read this post: How to save PPO trajectory and train at a later time that hits pretty close to my target, however I am wondering if there is a way to save these trajectories in memory to avoid extensive I/O to the disk. I understand that callbacks seems to be the way to do this, however, it seems like the only way to do this by using the dictionaries in their “intended” way would be to use the custom_metrics dictionary and just return raw custom metrics by using the configuration key "keep_per_episode_custom_metrics" (this would be a duct tape patch if it did work, imo).

Alternatively, I could save the states in the environment itself, but having read through the documentation for 2 days now, I can’t find a way to access a RolloutWorker’s attributes / environment object.

I’ve read further the RolloutWorker’s methods and found that I can call foreach_env on them.

Here’s how it works, for posterity:

  1. In your environment, make a list that stockpiles every observed states. (In environment class) Ex :
def record_state(self):
    self.all_observed_states.append(self.current_state.clone())
  1. Write a getter for said list. (In environment class) Ex :
def get_obs_states(self):
    return self.all_observed_states
  1. In your main script, write a function that takes as input an environment and calls the env’s getter for the list. Ex:
fn = lambda env: env.get_obs_states()
  1. Call this:
observations = [
      ray.get(worker.foreach_env.remote(fn))
      for worker in trainer.workers.remote_workers()
]

observations will hold lists of all workers’ seen states during sampling.

I don’t think this is “great” but I guess it works.

Will mark as solution unless someone finds something better 24 hours from the time of this post
(2022-08-17T04:00:00Z).

1 Like