Mini forum guide/self-help guide

Welcome to the forums! :slightly_smiling_face:

This post gives general advice on how to be quicker in resolving issues with RLlib.
If you have a question regarding APIs or want to give feedback, feel free to skip this.
The following is a very general guide on how to approach most problems in RLlib.

Guide:

  1. Read the complete error message once to not miss anything important.
    a. Error messages may be raised on remote processes and result in longer stack traces that need to be examined carefully.
    b. Errors may also not be raised in the order you might expect.

  2. Chip away pieces of code that don’t produce the error.
    a. For example, if you are using Ray Tune in addition to RLlib, see if the issue occurs without Ray Tune. If you use your own environment, see if you can reproduce with a gymnasium environment and so on.
    b. In many instances, this will isolate the issue to a level where they become obvious already.

  3. If you have isolated the issue by chipping away pieces, you should be left with a minimal reproduction script.
    a. Set a breakpoint close to where the error is raised and dig around.
    b. Test your assumptions about what is happening. It may be useful to modify RLlib’s code a little and do a few experiments (see tips below).

  4. If you can not resolve the issue this way, ask for help while providing the minimal reproduction script from above and possibly things you found out in this process.

Tools/tips:

  • If you set num_rollout_workers=0 and don’t use Ray Tune, most of your RLlib experiments should run on a single process, and you can debug without worrying about distributed execution.
  • When debugging inside RLlib, you can carefully make changes to your local RLlib installation (and revert them afterwards). It might be useful to deploy a development environment to be able to edit RLlib conveniently and contribute fixes.
  • When debugging the algorithmic side of things, consider John Schulman’s nuts and bolts script as a starting point.

Happy debugging! :slight_smile: