Compute/display actions from ray.tune

Carterbouley · March 10, 2021, 3:38pm

Hi everyone.

I have trained a PPO agend using:

tune.run(
    run_or_experiment="PPO",
    config={
        "env": "Battery",
        "num_gpus" : 1,
        "num_workers": 13,
        "num_cpus_per_worker": 1,
        "train_batch_size": 1024,
        "num_sgd_iter": 20,
        'explore': True,
        'exploration_config': {'type': 'StochasticSampling'},
    },
    stop={'episode_reward_mean': 0.15},
    checkpoint_freq = 200,
    local_dir = 'second_checkpoints'
    
)

I want to be able to load an agent from a checkpoint and display the actions associated with each step taken.
Previously, I used PPOTrainer, however i have used tune in order to distribute the training.

How can I extract the agent from the ray.tune checkpoint to perform a test and see what the actions taken look like?:

while not done:
action, state, logits = agent.compute_action(obs, state)
obs, reward, done, info = env.step(action)
episode_reward += reward

sven1977 · March 11, 2021, 8:25am

Hey @Carterbouley , great question!
You can do this after tune completed training:

trained_trainer = PPOTrainer(config=config)
trained_trainer.restore("[your checkpoint]")
# Then your while loop:
while not done:
    action, state, logits = trained_trainer.compute_action(obs, state)
    obs, reward, done, info = env.step(action)
    episode_reward += reward

Carterbouley · March 11, 2021, 9:39am

Hi @sven1977 Sven,

Thanks so much for coming back to me. I was unsure if my config for tune is applicable for my config for PPOTrainer. I now get an error stating PPO object has no attribute ‘optimizer’.

Is there a way to fix this that you know of?

Thanks in advance for your help!

Carterbouley · March 12, 2021, 9:39am

The stack trace is here if it helps!

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-20-c721b7d26a42> in <module>
----> 1 trained_trainer.restore('checkpoint-12000')

~/Documents/rl_agent/ray-venv/lib/python3.8/site-packages/ray/tune/trainable.py in restore(self, checkpoint_path)
    364             self.load_checkpoint(checkpoint_dict)
    365         else:
--> 366             self.load_checkpoint(checkpoint_path)
    367         self._time_since_restore = 0.0
    368         self._timesteps_since_restore = 0

~/Documents/rl_agent/ray-venv/lib/python3.8/site-packages/ray/rllib/agents/trainer.py in load_checkpoint(self, checkpoint_path)
    695     def load_checkpoint(self, checkpoint_path: str):
    696         extra_data = pickle.load(open(checkpoint_path, "rb"))
--> 697         self.__setstate__(extra_data)
    698 
    699     @DeveloperAPI

~/Documents/rl_agent/ray-venv/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py in __setstate__(self, state)
    175         @override(Trainer)
    176         def __setstate__(self, state):
--> 177             Trainer.__setstate__(self, state)
    178             self.train_exec_impl.shared_metrics.get().restore(
    179                 state["train_exec_impl"])

~/Documents/rl_agent/ray-venv/lib/python3.8/site-packages/ray/rllib/agents/trainer.py in __setstate__(self, state)
   1238                 r.restore.remote(remote_state)
   1239         if "optimizer" in state:
-> 1240             self.optimizer.restore(state["optimizer"])
   1241 
   1242     @staticmethod

AttributeError: 'PPO' object has no attribute 'optimizer'

I have been searching but I don’t see anyone else having posted this problem!
Thanks in advance

sven1977 · March 12, 2021, 7:17pm

Hmm, strange. Could you send me a short reproduction script that shows this behavior?
Then I can debug. Never seen this in our train/save/restore tests.

Carterbouley · March 15, 2021, 9:12am

Thats a very generous offer Sven, Thank you. I have sent you an email with further information!

Carterbouley · March 18, 2021, 10:02am

Hi Sven,

I sent you an email and a message through this platform, could you confirm if you have received either?

Thanks in advance!

Carterbouley · March 22, 2021, 9:15am

Hi @sven1977, just wondering if I have sent you enough details via email to be able to recreate the problem?

Thanks,

Carter

Carterbouley · March 24, 2021, 10:53am

Or if any of the other Ray team is available to help, that would be great too! Thanks!

Carterbouley · March 24, 2021, 3:33pm

Hi Guys

I believe I fixed this. I was using ray 0.8.3 to train and 1.2 when trying to restore. Restoring in 0.8.4 fixed this.

sven1977 · March 30, 2021, 9:54am

@Carterbouley , this is great! Sorry, hadn’t had time to look at this yet. Thanks for sharing a solution.

Topic		Replies	Views
Restore agent and continue training with tune.run() RLlib	2	607	July 6, 2021
Raise NotImplementedError when I try to restore the best trained agent RLlib	2	598	June 3, 2021
How to make checkpoint by ray.tune.run and load it? RLlib	3	2783	July 7, 2022
RLLib Multiagent: Load only one policy from checkpoint & Compatibility of RLLib/Tune Checkpoints RLlib	9	3291	November 24, 2021
Policy rollout on Ray Tune 2.0 RLlib	4	316	December 15, 2022

Compute/display actions from ray.tune

Related topics