Can not save policies in checkpointing

Hi,
I’m in the transition of rllib v1.13.0 to v2.2.0. I finished the configuration with the new version and everything goes good but I can not save the policies in the checkpoints. I recive the following mesage:

-- Can not figure out a durable policy name for <class 'ray.rllib.policy.eager_tf_policy.DQNTFPolicy_eager_traced'>. You are probably trying to checkpoint a custom policy. Raw policy class may cause problems when the checkpoint needs to be loaded in the future. To fix this, make sure you add your custom policy in rllib.algorithms.registry.POLICIES.

I had the same error when I used tf, tf2 or torch frameworks. I’m using DQNConfig whith the following parameters especification:

algo = (
        DQNConfig()
       #training options,
        .training(
            train_batch_size = args.train_batch_size,
            model = { #DQN training options,
                "fcnet_hiddens": args.model_fcnet_hiddens,
                "fcnet_activation": args.model_fcnet_activation,
                },
            n_step = args.model_n_step,
            hiddens = args.model_hiddens,
            replay_buffer_config = {
            "learning_starts": args.learning_starts,
            "capacity": args.capacity,
            },
        )
        #environment options,
        .environment(
            env=args.environment
        )
        #deep learning framework options,
        .framework(
            framework = args.framework,
            eager_tracing=True
        )
        #rollout worker options,
        .rollouts(
            num_rollout_workers = args.num_workers,
            rollout_fragment_length = args.rollout_fragment_length,
            recreate_failed_workers = args.recreate_failed_workers
            )
        #exploration options,
        .exploration(
            explore = args.explore,
            exploration_config = {
                "type": args.exploration_config_type,
                "initial_epsilon": args.exploration_config_initial_epsilon,
                "final_epsilon": args.exploration_config_final_epsilon,
                "epsilon_timesteps": args.exploration_config_epsilon_timesteps,  # Timesteps over which to anneal epsilon.
            },
        )
        #options for training with offline data,
        .offline_data(
            input_ = _input, # Use the `PolicyServerInput` to generate experiences.
        )
        #options for training multiple agents,
        .multi_agent(
            policies = {
                "DualSetPoint": PolicySpec(
                    policy_class = None,  # infer automatically from Algorithm
                    observation_space = args.DSP_obs,
                    action_space = args.DSP_acc,
                    config = {"gamma": args.DSP_gamma, "lr": args.DSP_lr}
                ),
                "NorthWindow": PolicySpec(
                    policy_class = None,  # infer automatically from Algorithm
                    observation_space = args.NW_obs,
                    action_space = args.NW_acc,
                    config = {"gamma": args.NW_gamma, "lr": args.NW_lr}
                ),
                "SouthWindow": PolicySpec(
                    policy_class = None,  # infer automatically from Algorithm
                    observation_space = args.SW_obs,
                    action_space = args.SW_acc,
                    config = {"gamma": args.SW_gamma, "lr": args.SW_lr}
                ),
                "NorthWindowBlind": PolicySpec(
                    policy_class = None,  # infer automatically from Algorithm
                    observation_space = args.NWB_obs,
                    action_space = args.NWB_acc,
                    config = {"gamma": args.NWB_gamma, "lr": args.NWB_lr}
                )
            },
            policy_mapping_fn = policy_mapping_fn,
        )
        #reporting options,
        .reporting(
            min_sample_timesteps_per_iteration = args.timesteps_per_iteration
        )
        #options for saving and restoring checkpoints,
        .checkpointing(
            export_native_model_files = args.export_native_model_files
        )        
        #debugging options,
        .debugging(
            log_level = args.log_level,
            seed = args.seed,
        )        
        #options for adding callbacks to algorithms,
        .callbacks(
            callbacks_class = MyCallbacks if args.callbacks_verbose else None, # Create a "chatty" client/server or not.
        )        
        #Resource options
        .resources(
            num_gpus = args.num_gpus,
        )
        .build()
    )

How can I fix it? I never register a policy before. Thanks!

Hi @hermmanhender , so far we have no registration possibility for custom policies. An explanation can be found in this issue by @sven1977

Basically, the warning should notify a user that she will be responsible for backward compatibility as RLlib cannot handle this.