Compute/display actions from ray.tune

Hi everyone.

I have trained a PPO agend using:

tune.run(
    run_or_experiment="PPO",
    config={
        "env": "Battery",
        "num_gpus" : 1,
        "num_workers": 13,
        "num_cpus_per_worker": 1,
        "train_batch_size": 1024,
        "num_sgd_iter": 20,
        'explore': True,
        'exploration_config': {'type': 'StochasticSampling'},
    },
    stop={'episode_reward_mean': 0.15},
    checkpoint_freq = 200,
    local_dir = 'second_checkpoints'
    
)

I want to be able to load an agent from a checkpoint and display the actions associated with each step taken.
Previously, I used PPOTrainer, however i have used tune in order to distribute the training.

How can I extract the agent from the ray.tune checkpoint to perform a test and see what the actions taken look like?:

while not done:
action, state, logits = agent.compute_action(obs, state)
obs, reward, done, info = env.step(action)
episode_reward += reward

1 Like

Hey @Carterbouley , great question!
You can do this after tune completed training:

trained_trainer = PPOTrainer(config=config)
trained_trainer.restore("[your checkpoint]")
# Then your while loop:
while not done:
    action, state, logits = trained_trainer.compute_action(obs, state)
    obs, reward, done, info = env.step(action)
    episode_reward += reward
3 Likes

Hi @sven1977 Sven,

Thanks so much for coming back to me. I was unsure if my config for tune is applicable for my config for PPOTrainer. I now get an error stating PPO object has no attribute ‘optimizer’.

Is there a way to fix this that you know of?

Thanks in advance for your help!

The stack trace is here if it helps!

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-20-c721b7d26a42> in <module>
----> 1 trained_trainer.restore('checkpoint-12000')

~/Documents/rl_agent/ray-venv/lib/python3.8/site-packages/ray/tune/trainable.py in restore(self, checkpoint_path)
    364             self.load_checkpoint(checkpoint_dict)
    365         else:
--> 366             self.load_checkpoint(checkpoint_path)
    367         self._time_since_restore = 0.0
    368         self._timesteps_since_restore = 0

~/Documents/rl_agent/ray-venv/lib/python3.8/site-packages/ray/rllib/agents/trainer.py in load_checkpoint(self, checkpoint_path)
    695     def load_checkpoint(self, checkpoint_path: str):
    696         extra_data = pickle.load(open(checkpoint_path, "rb"))
--> 697         self.__setstate__(extra_data)
    698 
    699     @DeveloperAPI

~/Documents/rl_agent/ray-venv/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py in __setstate__(self, state)
    175         @override(Trainer)
    176         def __setstate__(self, state):
--> 177             Trainer.__setstate__(self, state)
    178             self.train_exec_impl.shared_metrics.get().restore(
    179                 state["train_exec_impl"])

~/Documents/rl_agent/ray-venv/lib/python3.8/site-packages/ray/rllib/agents/trainer.py in __setstate__(self, state)
   1238                 r.restore.remote(remote_state)
   1239         if "optimizer" in state:
-> 1240             self.optimizer.restore(state["optimizer"])
   1241 
   1242     @staticmethod

AttributeError: 'PPO' object has no attribute 'optimizer'

I have been searching but I don’t see anyone else having posted this problem!
Thanks in advance

Hmm, strange. Could you send me a short reproduction script that shows this behavior?
Then I can debug. Never seen this in our train/save/restore tests.

Thats a very generous offer Sven, Thank you. I have sent you an email with further information!

Hi Sven,

I sent you an email and a message through this platform, could you confirm if you have received either?

Thanks in advance!

Hi @sven1977, just wondering if I have sent you enough details via email to be able to recreate the problem?

Thanks,

Carter

Or if any of the other Ray team is available to help, that would be great too! Thanks!

Hi Guys

I believe I fixed this. I was using ray 0.8.3 to train and 1.2 when trying to restore. Restoring in 0.8.4 fixed this.

3 Likes

@Carterbouley , this is great! Sorry, hadn’t had time to look at this yet. Thanks for sharing a solution.

1 Like