Tune as part of curriculum training

gjoliver · March 9, 2023, 11:51pm

I think one way to do this is by callbacks.
Notice that in the example Artur gave you, what we essentially do is to load a pre-trained policy checkpoint into the algorithm, and use that as the baseline for further Training.

github.com

ray-project/ray/blob/master/rllib/examples/documentation/saving_and_loading_algos_and_policies.py#LL151C22-L151C73


      
          my_restored_policy = Policy.from_checkpoint("/tmp/my_policy_checkpoint")

You can actually do that with a callback like on_algorithm_start():

github.com

ray-project/ray/blob/master/rllib/algorithms/callbacks.py#L61-L76


      
          def on_algorithm_init(
              self,
              *,
              algorithm: "Algorithm",
              **kwargs,
          ) -> None:
              """Callback run when a new algorithm instance has finished setup.
          
          
    This method gets called at the end of Algorithm.setup() after all
              the initialization is done, and before actually training starts.
          
          
    Args:
                  algorithm: Reference to the trainer instance.
                  kwargs: Forward compatibility placeholder.
              """
              pass

You have access to the algorithm, and you just need to create a policy from the checkpoint you want to resume, and do algorithm.add_policy(policy=<your reloaded policy>):

github.com

ray-project/ray/blob/master/rllib/algorithms/algorithm.py#L1765


      
          @PublicAPI
          def set_weights(self, weights: Dict[PolicyID, dict]):
              """Set policy weights by policy id.
          
          
    Args:
                  weights: Map of policy ids to weights to set.
              """
              self.workers.local_worker().set_weights(weights)
          
          
@PublicAPI
          def add_policy(
              self,
              policy_id: PolicyID,
              policy_cls: Optional[Type[Policy]] = None,
              policy: Optional[Policy] = None,
              *,
              observation_space: Optional[gym.spaces.Space] = None,
              action_space: Optional[gym.spaces.Space] = None,
              config: Optional[Union[AlgorithmConfig, PartialAlgorithmConfigDict]] = None,
              policy_state: Optional[PolicyState] = None,
              policy_mapping_fn: Optional[Callable[[AgentID, EpisodeID], PolicyID]] = None,

Give this a try. Your project sounds really exciting actually

Topic		Replies	Views
Where do I find documentation on the tune.run method	3	2093	June 12, 2023
Rllib and tune.run vs tune.Tuner serialization of trajectory data Ray Tune	1	237	March 1, 2024
Some questions about tune	0	377	April 19, 2023
Ray Tune and Ray RLLIB RLlib	1	195	April 14, 2023
Continue training for successful ray tune candidates	3	865	October 7, 2022

Tune as part of curriculum training

Related topics