Passing trained agents into Trainable

Kittiwin-Kumlungmak · September 10, 2022, 5:12pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi,

Is there a way to pass trained trainers, such as PPOTrainer, into a custom environment and then use that environment to train another?

My idea is to train an agent which I call “supervisor” to select a suitable trained agent to compute action for a given observation.

For example, you have an agent “A” being trained to trade stock during bull market and you have another agent “B” being trained to traded stock during bear market. Then, you have a “supervisor” that is trained to select either “A” or “B” to trade stock based on data being observed by “supervisor”.

arturn · September 11, 2022, 3:35pm

Hi @Kittiwin-Kumlungmak ,

Is there a way to pass trained trainers, such as PPOTrainer, into a custom environment and then use that environment to train another?

This sounds very much like a standard task to me. You train one algorithm, for example PPO, and collect data that you compile into a dataset. This data can then be used to train an offline algorithm.
This is possible in RLlib.

From how I understand your idea, these three different agents compose one “large” agent that acts upon a single environment in the end. You can train a network with your stock market data that outputs “bull” or “bear” in a supervised fashion and then incorporate this network into your agent similar to the following pseudocode:

sueprvised_climate_predictor.requires_grad = False
[...]
market_climate = supervised_climate_predictor(obs)
if market_cliimate is BEAR:
   return bear_network(obs)
else:
   return bull_network(obs)

Best of luck!

Kittiwin-Kumlungmak · September 11, 2022, 4:47pm

Hello @arturn

I think what you suggested is not exactly what I have in mind or I may not totally understand what you suggested. Actually, I want to train my “supervisor” in a reinforcement learning fashion rather than a supervised fashion. So, my idea is to have “supervisor” learns to select the right agent, “bull” or “bear”, for trading at the right time. Also, “bull” and “bear”, in this case, have already been trained separately.

I found this discussion and I think I can do something similar for my project by passing “bull” and “bear” as policies into multi-agent env.

What do you think?

mgerstgrasser · September 11, 2022, 7:57pm

Hi @Kittiwin-Kumlungmak , I’ve done something broadly similar in the past, so maybe I can help. What I’ve done once is load an agent purely inside the environment, i.e. in the environment create an rllib Policy object, and then call policy.compute_single_action(). You could do this for each of the two policies to achieve what you want. If you do this, then rllib will never even see the policies, and the policies won’t get trained, of course. The other thing you could do is somehow load the pre-trained policies inside your supervisor policy class - but I’m less sure how the details of that would work.

If you go with the environment route, there are a few things you need to do that took me a while to figure out, so maybe I can help you save some time here:

You need to manually get the preprocessor for your observation space, e.g.

self.preprocessor = ray.rllib.models.ModelCatalog.get_preprocessor_for_space(
            obs_space, config.get("model")
        )

If using tensorflow, you want a separate scope:

        with tf1.variable_scope("PretrainedRLLibPolicy" + str(id(self))):
            self.policy = ray.rllib.agents.a3c.a3c_tf_policy.A3CTFPolicy(
                obs_space=obs_space, action_space=ac_space, config=config
            )

You can then load the weights and set them in the policy object. The following is loading them from a ray tune checkpoint:

            with open(checkpoint_file, "rb") as f:
                checkpoint_data = pickle.load(f)
            model_data = pickle.loads(checkpoint_data["worker"])["state"][
                checkpoint_agent_id
            ]
            self.policy.set_state(model_data)

To compute an action you can now do:

self.policy.compute_single_action(self.preprocessor.transform(obs))[0]

Some of the details might have changed a little in ray 2.0. Hope this helps!

Topic		Replies	Views
Multi-Agent Training with Different Algorithms RLlib	24	3441	October 11, 2022
Step by step way to interact with an environment and update an agent Configure Algorithm, Training, Evaluation, Scaling	1	349	May 23, 2023
How can I train multiple 'trainer' in same environment?(or embed trained trainer in environment?) RLlib	3	488	January 9, 2023
How to run multiple trainers? RLlib	2	330	August 26, 2022
Training on multiple environment Offline RL	2	897	February 14, 2023

Passing trained agents into Trainable

Related topics