How to load from check_point and call the environment

Between:

  • Low: It annoys or frustrates me for a moment.
  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I’m working on my master thesis, which is about applying reinforcement learning to a PlayStation 1 game.
This is a 30min summary of what I did:

But here is the situation.
My gymnasium environment makes a TCP connection to an emulator upon instantiating the environment.

I originally had a lot of trouble to get this working, but here was my final workaround:

my_config = DEFAULT_CONFIG_DICT
my_config["interface"] = MyGranTurismoRTGYM
my_config["time_step_duration"] = 0.05
my_config["start_obs_capture"] = 0.05
my_config["time_step_timeout_factor"] = 1.0
my_config....

def env_creator(env_config):
  env = gymnasium.make("real-time-gym-v1", config=my_config)
  return env  # return an env instance

from ray.tune.registry import register_env
register_env("gt-rtgym-env-v1", env_creator)

ray.init()

algo = (
    PPOConfig()
    .environment(
        env="gt-rtgym-env-v1",
        disable_env_checking=True,
        render_env=False,
        )
...
    .build()
)

# Some training loops and then..

path_to_checkpoint = algo.save()

And things work well, the agent is learning and getting better.

Now, I wish to reload this training using (based on the documentation site shows this example):

my_config = DEFAULT_CONFIG_DICT
my_config["interface"] = MyGranTurismoRTGYM
my_config["time_step_duration"] = 0.05
my_config["start_obs_capture"] = 0.05
my_config["time_step_timeout_factor"] = 1.0

def env_creator(env_config):
  env = gymnasium.make("real-time-gym-v1", config=my_config)
  return env  # return an env instance

from ray.tune.registry import register_env
register_env("gt-rtgym-env-v1", env_creator)
ray.init()

algo = Algorithm.from_checkpoint("C:/PPO_gt-rtgym-env-v1_2023-05-13_00-31-13_kmaa6ii/checkpoint_000161")

episode_reward = 0
terminated = truncated = False
obs, info = env.reset()

while not terminated and not truncated:
    action = algo.compute_single_action(obs)
    obs, reward, terminated, truncated, info = env.step(action) # how to access the same env that the algo is re-attempting to create?

    episode_reward += reward

I cannot use the above method, because, in my case algo = Algorithm.from_checkpoint("C:/PPO_gt-rtgym-env-v1_2023-05-13_00-31-13_kmaa6ii/checkpoint_000161") re-instantiates an environment (and I see the agent trying to connect to the emulator via TCP).

Then obviously obs, info = env.reset() does not work, since there is no env.
If I attempt to do: env = gymnasium.make("real-time-gym-v1") it will also obviously also not work since this will try to instantiate a second environment (which will try to connect to another emulator).

So - question here is:

  1. Is there a way for me to directly use the environment restored algorithm restore_algo’s_enviornment.reset()?

  2. Is there a way for me to drop the restored environment to make use of my other one?

FYI I tried seraching a lot, on this, but it s a bit more challenging with changes in the API.

I have also since tried loading the policy instead of the full algorithm:

from ray.rllib.policy.policy import Policy

my_restored_policy = Policy.from_checkpoint("C:/Users/nadir/ray_results/PPO_gt-rtgym-env-v1_2023-05-13_00-31-13_kmaa6ii/checkpoint_000161")

env = gymnasium.make("real-time-gym-v1", config=my_config)

terminated = truncated = False
obs, info = env.reset()

while not terminated and not truncated:
    action = my_restored_policy.compute_single_action(obs)
    obs, reward, terminated, truncated, info = env.step(action)
    episode_reward += reward

I get the following error:

AttributeError: 'dict' object has no attribute 'compute_single_action'
1 Like

New attempt… to remove the environment…

#1 Load the algorithm again…

from ray.rllib.algorithms.algorithm import Algorithm
import ray
from myRTClass import MyGranTurismoRTGYM, DEFAULT_CONFIG_DICT
import gymnasium


my_config = DEFAULT_CONFIG_DICT
my_config["interface"] = MyGranTurismoRTGYM
my_config["time_step_duration"] = 0.05
my_config["start_obs_capture"] = 0.05
my_config["time_step_timeout_factor"] = 1.0
my_config["act_buf_len"] = 3
my_config["reset_act_buf"] = False
my_config["benchmark"] = True
my_config["benchmark_polyak"] = 0.2

def env_creator(env_config):
  env = gymnasium.make("real-time-gym-v1", config=env_config)
  return env  # return an env instance

from ray.tune.registry import register_env
register_env("gt-rtgym-env-v1", lambda config: env_creator(my_config)) # better way

ray.init()

# env = gymnasium.make("real-time-gym-v1")
algo = Algorithm.from_checkpoint("C:/Users/nadir/ray_results/PPO_gt-rtgym-env-v1_2023-05-13_00-31-13_kmaa6ii/checkpoint_000161")
algo.stop()

Get the policy / model to extract the obs_space and action_space:

policy = algo.get_policy()
model = policy.model

Create a new config:

new_config = {
    # Indicate that the Algorithm we setup here doesn't need an actual env.
    "env": None,
    "observation_space": model.obs_space,
    "action_space": model.action_space,
    # ...
}

change the loaded algo’s config:

algo.config = new_config

Manually load an environment:

env = gymnasium.make("real-time-gym-v1", config=my_config)

Run things the classic gymnasium way:

episode_reward = 0
terminated = truncated = False
obs, info = env.reset()

while not terminated and not truncated:
    action = algo.compute_single_action(obs)
    obs, reward, terminated, truncated, info = env.step(action)
    episode_reward += reward

Errors:

   1486 # `unsquash_action` is None: Use value of config['normalize_actions'].
   1487 if unsquash_action is None:
-> 1488     unsquash_action = self.config.normalize_actions
   1489 # `clip_action` is None: Use value of config['clip_actions'].
   1490 elif clip_action is None:

AttributeError: 'dict' object has no attribute 'normalize_actions'

I guess I lost the pre-processing this way…

Any ideas anyone?
I would have thought that deploying the model for inference/evaluation that has already been trained is a key purpose of all of this.

Hi @NDR008,

Cool work!
config dicts are legacy. We started deprecating them because RLlib needs to do quite a bit of validation and a couple of other reasons.
Since you are using PPO (and PPOConfig), you can turn your new_config into a PPOConfig with PPOConfig.from_dict(new_config).

Let me know how it goes!

I would have thought that deploying the model for inference/evaluation that has already been trained is a key purpose of all of this.

Indeed. Our own abstractions (Policy/Model) have made this a little harder than need be in the past though. Keep your eyes open for the new RLModules API (RL Modules (Alpha) — Ray 2.8.0) - one of our key reasons for going for this API is to make inference and serving easier.

Have a great day!

This is an unintuitive part of RLlib.
Policy.from_checkpoint() allows loading algorithm checkpoints. In this case it will return a dict of policies. You can grab the single policy from that dict and run inference with it.

And a heads up: Policy does not hold a preprocessor and does not normalize/unsquash anything.
So be aware that usually you should use Algorithm(in this case PPO) to compute actions!

How can I use the algorithm to computer the action?

The Algorithm class has almost the same API.
You can use Algorithm.compute_single_action or Algorithm.compute_actions.

1 Like

I load my environment config and re-register it:

if not debugAsGym:
    def env_creator(env_config):
        env = gymnasium.make("real-time-gym-v1", config=env_config)
        return env  # return an env instance

    from ray.tune.registry import register_env
    register_env("gt-rtgym-env-v1", lambda config: env_creator(my_config)) 

I then load the checkpoint:

from ray.rllib.algorithms.algorithm import Algorithm
algo = Algorithm.from_checkpoint("C:/Users/mrX/ray_results/PPO_gt-rtgym-env-v1_2023-05-19_07-37-37z3d6v2w2/checkpoint_002000")

Then Ray re-loads things including a connection to my environment.

(RolloutWorker pid=12400) GT Real Time instantiated
(RolloutWorker pid=12400) GT AI Server instantiated for rtgym
(RolloutWorker pid=12400) still simple reward system
(RolloutWorker pid=12400) starting up on localhost port 9999
(RolloutWorker pid=12400) Waiting for a connection
(RolloutWorker pid=12400) Connection from ('127.0.0.1', 57007)
2023-05-21 13:50:06,340	INFO trainable.py:172 -- Trainable.setup took 11.294 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.

But I do not understand how to use the algorithm to take actions…

algo.compute_single_action()

Leads to:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
j:\git\TensorFlowPSX\Py\rrlib_experiments_PPO_mode4_Supra_Drag.ipynb Cell 14 in 1
----> 1 algo.compute_single_action()

File c:\Users\mrX\anaconda3\envs\GTAI2\lib\site-packages\ray\rllib\algorithms\algorithm.py:1526, in Algorithm.compute_single_action(self, observation, state, prev_action, prev_reward, info, input_dict, policy_id, full_fetch, explore, timestep, episode, unsquash_action, clip_action, unsquash_actions, clip_actions, **kwargs)
   1524     observation = input_dict[SampleBatch.OBS]
   1525 else:
-> 1526     assert observation is not None, err_msg
   1528 # Get the policy to compute the action for (in the multi-agent case,
   1529 # Trainer may hold >1 policies).
   1530 policy = self.get_policy(policy_id)

AssertionError: Provide either `input_dict` OR [`observation`, ...] as args to Trainer.compute_single_action!

I cannot pass the environment’s observation to it, since my environment session was already triggered while re-loading the algorithms :frowning:

Hi @NDR008,

If this was causing me such an issue for so long this is what I would do.

Create a Dummy environment and register it with the same name as the one used for training.

Restore from checkpoint. This will now create and reset dummy environments. The rllib RandomEnv is a good candidate here.

Create my real environment and then compute actions in a loop as. you currently have.

1 Like

manny is right, you’ll need to hack this for now.
A dummy environment sounds straightforward.

Will the dummy env have to have the same observation space / action space?

Could you give me some boiler code on how to use the RandomEnv?

Thanks in advance.

@NDR008 Yes.
You can import the random env from here: ray/random_env.py at master · ray-project/ray · GitHub
The file also contains two children of random env that show how to use it.