Hi,
I’m in the transition of rllib v1.13.0 to v2.2.0. I finished the configuration with the new version and everything goes good but I can not save the policies in the checkpoints. I recive the following mesage:
-- Can not figure out a durable policy name for <class 'ray.rllib.policy.eager_tf_policy.DQNTFPolicy_eager_traced'>. You are probably trying to checkpoint a custom policy. Raw policy class may cause problems when the checkpoint needs to be loaded in the future. To fix this, make sure you add your custom policy in rllib.algorithms.registry.POLICIES.
I had the same error when I used tf, tf2 or torch frameworks. I’m using DQNConfig whith the following parameters especification:
algo = (
DQNConfig()
#training options,
.training(
train_batch_size = args.train_batch_size,
model = { #DQN training options,
"fcnet_hiddens": args.model_fcnet_hiddens,
"fcnet_activation": args.model_fcnet_activation,
},
n_step = args.model_n_step,
hiddens = args.model_hiddens,
replay_buffer_config = {
"learning_starts": args.learning_starts,
"capacity": args.capacity,
},
)
#environment options,
.environment(
env=args.environment
)
#deep learning framework options,
.framework(
framework = args.framework,
eager_tracing=True
)
#rollout worker options,
.rollouts(
num_rollout_workers = args.num_workers,
rollout_fragment_length = args.rollout_fragment_length,
recreate_failed_workers = args.recreate_failed_workers
)
#exploration options,
.exploration(
explore = args.explore,
exploration_config = {
"type": args.exploration_config_type,
"initial_epsilon": args.exploration_config_initial_epsilon,
"final_epsilon": args.exploration_config_final_epsilon,
"epsilon_timesteps": args.exploration_config_epsilon_timesteps, # Timesteps over which to anneal epsilon.
},
)
#options for training with offline data,
.offline_data(
input_ = _input, # Use the `PolicyServerInput` to generate experiences.
)
#options for training multiple agents,
.multi_agent(
policies = {
"DualSetPoint": PolicySpec(
policy_class = None, # infer automatically from Algorithm
observation_space = args.DSP_obs,
action_space = args.DSP_acc,
config = {"gamma": args.DSP_gamma, "lr": args.DSP_lr}
),
"NorthWindow": PolicySpec(
policy_class = None, # infer automatically from Algorithm
observation_space = args.NW_obs,
action_space = args.NW_acc,
config = {"gamma": args.NW_gamma, "lr": args.NW_lr}
),
"SouthWindow": PolicySpec(
policy_class = None, # infer automatically from Algorithm
observation_space = args.SW_obs,
action_space = args.SW_acc,
config = {"gamma": args.SW_gamma, "lr": args.SW_lr}
),
"NorthWindowBlind": PolicySpec(
policy_class = None, # infer automatically from Algorithm
observation_space = args.NWB_obs,
action_space = args.NWB_acc,
config = {"gamma": args.NWB_gamma, "lr": args.NWB_lr}
)
},
policy_mapping_fn = policy_mapping_fn,
)
#reporting options,
.reporting(
min_sample_timesteps_per_iteration = args.timesteps_per_iteration
)
#options for saving and restoring checkpoints,
.checkpointing(
export_native_model_files = args.export_native_model_files
)
#debugging options,
.debugging(
log_level = args.log_level,
seed = args.seed,
)
#options for adding callbacks to algorithms,
.callbacks(
callbacks_class = MyCallbacks if args.callbacks_verbose else None, # Create a "chatty" client/server or not.
)
#Resource options
.resources(
num_gpus = args.num_gpus,
)
.build()
)
How can I fix it? I never register a policy before. Thanks!