Why do we need a lock in _compute_action_help in torch_policy_v2.py

How severe does this issue affect your experience of using Ray?

  • None: Just asking a question out of curiosity
def _compute_action_helper(
    self, input_dict, state_batches, seq_lens, explore, timestep

I would like to know where the race condition comes from to make the lock needed.


According to ray.rllib.utils.threading.with_lock, it is an object level lock. I guess it avoids race condition between methods(learn_on_batch, compute_gradients) inside torch_policy_v2 only.