Hi everyone,
I’m trying to apply RL to a real life setup. When starting a new episode, i want to apply the policy right away, but not store the first few transitions to avoid any transient effects. It would be a condition on position for example, and such an information is in the state. How can i modify my replay buffer add(), or step() method of my environment, to ignore some transitions ?
Thanks in advance
Hi,
You can modify the add()
method of your custom replay buffer to conditionally store transitions based on the state information. Here’s an example of how you can create a custom replay buffer by extending the base ReplayBuffer class:
from ray.rllib.utils.replay_buffers.replay_buffer import ReplayBuffer
class CustomReplayBuffer(ReplayBuffer):
def add(self, data):
# Check the condition based on the state information
if self.should_store_transition(data):
super().add(data)
def should_store_transition(self, data):
# Implement your condition here based on the state information
# For example, if the position is in the state, you can check if it meets your criteria
return True # or False based on your condition
Then, you can use this custom replay buffer in your RLlib configuration:
config = (
DQNConfig()
.environment("CartPole-v1")
.framework(framework=args.framework)
.rollouts(num_rollout_workers=4)
.training(
replay_buffer_config={"type": CustomReplayBuffer},
)
)
This way, your custom replay buffer will only store the transitions that meet your specified condition based on the state information.