I am trying to train an SAC RL agent under the new Ray2.40 stack with prioritized_replay enabled.
How this was usually done is by passing the replay_buffer_config with appropriate parameters.
sac_config = (
SACConfig()
.environment("LunarLanderContinuous-v3")
.env_runners(num_env_runners=5,
num_gpus_per_env_runner=0.10)
.learners(num_learners=1,
num_gpus_per_learner=0.5)
.framework("torch")
.training(actor_lr = CONFIG['sac']['actor_lr'],
critic_lr = CONFIG['sac']['critic_lr'],
gamma = CONFIG['sac']['gamma'],
train_batch_size = CONFIG['sac']['batch_size'],
tau = CONFIG['sac']['tau'],
initial_alpha=0.2,
target_entropy='auto',
twin_q=True,
replay_buffer_config={
"type": "MultiAgentReplayBuffer", # Default buffer type
"prioritized_replay": True, # Enable PER
"training_starts": 100 * 180, # Delay learning until 100 episodes
"prioritized_replay_alpha": 0.6, # Level of prioritization
"prioritized_replay_beta": 0.4, # Importance sampling weights
"prioritized_replay_eps": 1e-6, # Small constant for stability
"capacity": 1000000, # Buffer size
})
.evaluation(evaluation_num_workers=1,
evaluation_interval=CONFIG['training']['evaluation_interval'])
)
However once I updated to Ray2.4, I get the error that :
ValueError: When using the new
EnvRunner APIthe replay buffer must be of type
EpisodeReplayBuffer.
Alright, so I changed the type to ‘EpisodeReplayBuffer’ and now I get the error which essentially says that ‘prioritized_replay’ is not recognized.
Would appreciate some help in this as having a PER for my buffer would greatly improving training performance