I’m in the transition of rllib v1.13.0 to v2.2.0. I finished the configuration with the new version and everything goes good but I can not save the policies in the checkpoints. I recive the following mesage:
-- Can not figure out a durable policy name for <class 'ray.rllib.policy.eager_tf_policy.DQNTFPolicy_eager_traced'>. You are probably trying to checkpoint a custom policy. Raw policy class may cause problems when the checkpoint needs to be loaded in the future. To fix this, make sure you add your custom policy in rllib.algorithms.registry.POLICIES.
I had the same error when I used tf, tf2 or torch frameworks. I’m using DQNConfig whith the following parameters especification:
algo = (
#training options,
train_batch_size = args.train_batch_size,
model = { #DQN training options,
"fcnet_hiddens": args.model_fcnet_hiddens,
"fcnet_activation": args.model_fcnet_activation,
n_step = args.model_n_step,
hiddens = args.model_hiddens,
replay_buffer_config = {
"learning_starts": args.learning_starts,
"capacity": args.capacity,
#environment options,
#deep learning framework options,
framework = args.framework,
#rollout worker options,
num_rollout_workers = args.num_workers,
rollout_fragment_length = args.rollout_fragment_length,
recreate_failed_workers = args.recreate_failed_workers
#exploration options,
explore = args.explore,
exploration_config = {
"type": args.exploration_config_type,
"initial_epsilon": args.exploration_config_initial_epsilon,
"final_epsilon": args.exploration_config_final_epsilon,
"epsilon_timesteps": args.exploration_config_epsilon_timesteps, # Timesteps over which to anneal epsilon.
#options for training with offline data,
input_ = _input, # Use the `PolicyServerInput` to generate experiences.
#options for training multiple agents,
policies = {
"DualSetPoint": PolicySpec(
policy_class = None, # infer automatically from Algorithm
observation_space = args.DSP_obs,
action_space = args.DSP_acc,
config = {"gamma": args.DSP_gamma, "lr": args.DSP_lr}
"NorthWindow": PolicySpec(
policy_class = None, # infer automatically from Algorithm
observation_space = args.NW_obs,
action_space = args.NW_acc,
config = {"gamma": args.NW_gamma, "lr": args.NW_lr}
"SouthWindow": PolicySpec(
policy_class = None, # infer automatically from Algorithm
observation_space = args.SW_obs,
action_space = args.SW_acc,
config = {"gamma": args.SW_gamma, "lr": args.SW_lr}
"NorthWindowBlind": PolicySpec(
policy_class = None, # infer automatically from Algorithm
observation_space = args.NWB_obs,
action_space = args.NWB_acc,
config = {"gamma": args.NWB_gamma, "lr": args.NWB_lr}
policy_mapping_fn = policy_mapping_fn,
#reporting options,
min_sample_timesteps_per_iteration = args.timesteps_per_iteration
#options for saving and restoring checkpoints,
export_native_model_files = args.export_native_model_files
#debugging options,
log_level = args.log_level,
seed = args.seed,
#options for adding callbacks to algorithms,
callbacks_class = MyCallbacks if args.callbacks_verbose else None, # Create a "chatty" client/server or not.
#Resource options
num_gpus = args.num_gpus,
How can I fix it? I never register a policy before. Thanks!