Running a random policy with RLlib

Hi, I want to run a random policy (ray.rllib.examples.policy.random_policy) on the CartPole-v1 environment. So far, I have build a custom trainer and want to run it via the ray.tune.run interface. Unfortunately, the execution fails with the error message:

File "/usr/local/lib/python3.8/dist-packages/ray/rllib/policy/policy.py", line 471, in load_batch_into_buffer
raise NotImplementedError

I’ve looked into overriding the Policy class (ray.rllib.policy.policy) but am unsure how to implement the methods load_batch_into_buffer, get_num_samples_loaded_into_buffer etc. This seems to be an unnecessary complicated way to do something so simple? Any suggestions or help would greatly be appreciated! (Ray version 1.10.0)

The minimal code to reproduce this issue is below:

import ray
from ray.rllib.examples.policy.random_policy import RandomPolicy
from ray.rllib.agents.trainer_template import build_trainer

RandomTrainer = build_trainer(
name="RandomTrainer",
default_policy=RandomPolicy
)
  
config = {"env": "CartPole-v1", "num_gpus": 0, "num_workers": 1, "framework": "torch"}

ray.tune.run(
  RandomTrainer,
  config=config,
  stop={"timesteps_total": 5},
)

Hi @sn73jq ,

and welcome to the ray discussion board. I have tested your code and have made some modification such that it runs:

import ray
from ray.rllib.examples.policy.random_policy import RandomPolicy
from ray.rllib.agents.trainer_template import build_trainer

RandomTrainer = build_trainer(
name="RandomTrainer",
default_policy=RandomPolicy
)
  
config = {"env": "CartPole-v1", "num_gpus": 0, "num_workers": 1, "framework": None}
ray.init(ignore_reinit_error=True, local_mode=True)
ray.tune.run(
  RandomTrainer,
  config=config,
  stop={"timesteps_total": 5},
)
ray.shutdown()

First of all RandomPolicy is a policy that cannot be trained (we have no model that helps to make better decisions as actions are always randomly chosen). Using torch as the framework will automatically set MultiGPUTrainOneStep() for training and this training step requires a function that bulk-loads the data into the different devices’ memories (that makes sense for torch or tf), however the RandomPolicy has not such function defined and the super-class Policy neither (it returns a NotImplementedError). As a result you receive a NotImplementedError.

When the Trainer is initialized it validates its config and therein it sets the simple_optimizer parameter to True (from DEPRECATED_VALUE, @sven1977, @avnishn what is the plan here?) when the framework is not tf or torch, which in turn will let the Trainer use TrainOneStep() instead of MultiGPUTrainOneStep(). The former function, however, does not need the load_batch_into_buffer() be defined and runs through. Setting framework to None implies exactly this behavior (setting directly simple_optimizer: True of course as well).

I admit that this behavior does not make much sense, but has right now historical reasons as the RLlib team is modifying a lot by integrating ray.train for training. I guess in the next releases these things will become more well-arranged (also more powerful :slight_smile: )

1 Like