Running a random policy with RLlib

sn73jq · February 25, 2022, 10:02am

Hi, I want to run a random policy (ray.rllib.examples.policy.random_policy) on the CartPole-v1 environment. So far, I have build a custom trainer and want to run it via the ray.tune.run interface. Unfortunately, the execution fails with the error message:

File "/usr/local/lib/python3.8/dist-packages/ray/rllib/policy/policy.py", line 471, in load_batch_into_buffer
raise NotImplementedError

I’ve looked into overriding the Policy class (ray.rllib.policy.policy) but am unsure how to implement the methods load_batch_into_buffer, get_num_samples_loaded_into_buffer etc. This seems to be an unnecessary complicated way to do something so simple? Any suggestions or help would greatly be appreciated! (Ray version 1.10.0)

The minimal code to reproduce this issue is below:

import ray
from ray.rllib.examples.policy.random_policy import RandomPolicy
from ray.rllib.agents.trainer_template import build_trainer

RandomTrainer = build_trainer(
name="RandomTrainer",
default_policy=RandomPolicy
)
  
config = {"env": "CartPole-v1", "num_gpus": 0, "num_workers": 1, "framework": "torch"}

ray.tune.run(
  RandomTrainer,
  config=config,
  stop={"timesteps_total": 5},
)

Lars_Simon_Zehnder · February 25, 2022, 8:23pm

Hi @sn73jq ,

and welcome to the ray discussion board. I have tested your code and have made some modification such that it runs:

import ray
from ray.rllib.examples.policy.random_policy import RandomPolicy
from ray.rllib.agents.trainer_template import build_trainer

RandomTrainer = build_trainer(
name="RandomTrainer",
default_policy=RandomPolicy
)
  
config = {"env": "CartPole-v1", "num_gpus": 0, "num_workers": 1, "framework": None}
ray.init(ignore_reinit_error=True, local_mode=True)
ray.tune.run(
  RandomTrainer,
  config=config,
  stop={"timesteps_total": 5},
)
ray.shutdown()

First of all RandomPolicy is a policy that cannot be trained (we have no model that helps to make better decisions as actions are always randomly chosen). Using torch as the framework will automatically set MultiGPUTrainOneStep() for training and this training step requires a function that bulk-loads the data into the different devices’ memories (that makes sense for torch or tf), however the RandomPolicy has not such function defined and the super-class Policy neither (it returns a NotImplementedError). As a result you receive a NotImplementedError.

When the Trainer is initialized it validates its config and therein it sets the simple_optimizer parameter to True (from DEPRECATED_VALUE, @sven1977, @avnishn what is the plan here?) when the framework is not tf or torch, which in turn will let the Trainer use TrainOneStep() instead of MultiGPUTrainOneStep(). The former function, however, does not need the load_batch_into_buffer() be defined and runs through. Setting framework to None implies exactly this behavior (setting directly simple_optimizer: True of course as well).

I admit that this behavior does not make much sense, but has right now historical reasons as the RLlib team is modifying a lot by integrating ray.train for training. I guess in the next releases these things will become more well-arranged (also more powerful )

Topic		Replies	Views
How do I call my custom TorchPolicyV2 train with ray.tune? RLlib	1	110	March 21, 2024
Registering Custom Environment for `CartPole-v1` with RLlib and Running via Command Line RLlib	8	1803	April 14, 2023
Running test evaluation with policy server input RLlib	0	204	January 13, 2023
RLLib: How to use policy learned in tune.run()? RLlib	6	983	September 21, 2023
Example of A3C only use CPU for trainer RLlib	10	850	July 23, 2021

Running a random policy with RLlib

Related topics