Passing trained agents into Trainable

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.


Is there a way to pass trained trainers, such as PPOTrainer, into a custom environment and then use that environment to train another?

My idea is to train an agent which I call “supervisor” to select a suitable trained agent to compute action for a given observation.

For example, you have an agent “A” being trained to trade stock during bull market and you have another agent “B” being trained to traded stock during bear market. Then, you have a “supervisor” that is trained to select either “A” or “B” to trade stock based on data being observed by “supervisor”.

Hi @Kittiwin-Kumlungmak ,

Is there a way to pass trained trainers, such as PPOTrainer, into a custom environment and then use that environment to train another?

This sounds very much like a standard task to me. You train one algorithm, for example PPO, and collect data that you compile into a dataset. This data can then be used to train an offline algorithm.
This is possible in RLlib.

From how I understand your idea, these three different agents compose one “large” agent that acts upon a single environment in the end. You can train a network with your stock market data that outputs “bull” or “bear” in a supervised fashion and then incorporate this network into your agent similar to the following pseudocode:

sueprvised_climate_predictor.requires_grad = False
market_climate = supervised_climate_predictor(obs)
if market_cliimate is BEAR:
   return bear_network(obs)
   return bull_network(obs)

Best of luck! :slight_smile:

Hello @arturn

I think what you suggested is not exactly what I have in mind or I may not totally understand what you suggested. Actually, I want to train my “supervisor” in a reinforcement learning fashion rather than a supervised fashion. So, my idea is to have “supervisor” learns to select the right agent, “bull” or “bear”, for trading at the right time. Also, “bull” and “bear”, in this case, have already been trained separately.

I found this discussion and I think I can do something similar for my project by passing “bull” and “bear” as policies into multi-agent env.

What do you think?

Hi @Kittiwin-Kumlungmak , I’ve done something broadly similar in the past, so maybe I can help. What I’ve done once is load an agent purely inside the environment, i.e. in the environment create an rllib Policy object, and then call policy.compute_single_action(). You could do this for each of the two policies to achieve what you want. If you do this, then rllib will never even see the policies, and the policies won’t get trained, of course. The other thing you could do is somehow load the pre-trained policies inside your supervisor policy class - but I’m less sure how the details of that would work.

If you go with the environment route, there are a few things you need to do that took me a while to figure out, so maybe I can help you save some time here:

  1. You need to manually get the preprocessor for your observation space, e.g.
self.preprocessor = ray.rllib.models.ModelCatalog.get_preprocessor_for_space(
            obs_space, config.get("model")
  1. If using tensorflow, you want a separate scope:
        with tf1.variable_scope("PretrainedRLLibPolicy" + str(id(self))):
            self.policy = ray.rllib.agents.a3c.a3c_tf_policy.A3CTFPolicy(
                obs_space=obs_space, action_space=ac_space, config=config
  1. You can then load the weights and set them in the policy object. The following is loading them from a ray tune checkpoint:
            with open(checkpoint_file, "rb") as f:
                checkpoint_data = pickle.load(f)
            model_data = pickle.loads(checkpoint_data["worker"])["state"][
  1. To compute an action you can now do:

Some of the details might have changed a little in ray 2.0. Hope this helps!