A little help for a novice

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Hi,

Having used other RL frameworks before, I have just started playing with RLlib, which I find amazing in its scope and scalability. However, I still find the documentation somehow confusing. For example, I am trying to find API reference for the ray,tune.run() method used on various of the examples, but for some reason I cannot find it in the current docs. Am I missing something here?

Other question is that I am confusing between those statements:

from ray.rllib.agents.ppo import PPOTrainer

import ray.rllib.algorithms.ppo as ppo

What is the exact difference? From seeing the base code, it seems PPOTrainer is an instance of PPO and it gets the default configuration. But apart from that, I do not see much difference, also it seems they are used indistinctly in the various examples.

The final question is that I want to run a training and automatically stop it once the mean rewards gets to a certain threshold:

  • Can this only be implemented by making use of the Tune library and setting a Tuner?
  • In that case, how can I retrieve the trained model for evaluation, and how do I perform such evaluation in that case (for example, I want to evaluate the model over 100 fresh episodes of the given environment, and get the mean_rewards over those 100 episodes)

At the moment I played with this little example, which worked fine:

import ray
from ray.rllib.agents.ppo import PPOTrainer
from ray import tune

config = {
    "env": "CartPole-v0",
    # Change the following line to `“framework”: “tf”` to use tensorflow
    "framework": "torch",
    "model": {
      "use_lstm": True,
    },
}
stop = {"episode_reward_mean": 195}

ray.shutdown()
ray.init(
  num_cpus=8,
  include_dashboard=False,
  ignore_reinit_error=True,
  log_to_driver=False,
)
# execute training 
analysis = ray.tune.run(
  "PPO",
  name = "PPO-lstm",
  config=config,
  stop=stop,
  checkpoint_at_end=True,
)

But after getting the analysis results, I ignore on ow to continue from there, meaning, how can I retrieve the trained model in case I want to continue a training? Or how can I evaluate the trained model). It is not clear to me how to retrieve the trained model from the analysis result.

Happy to hear what you have to say about scope and scalability!

  1. We are undergoing many larger changes that beg for more/better documentation. Please be patient with us as we are trying to catch up with this :slight_smile: our algorithms implement the tun.Trainable API and that’s our only intersection. Please refer to the tune docs to learn how to tune a Trainable and to the examples to learn how to use it together with RLlib.
  2. Don’t use ray.rllib.agents.ppo anymore. It’s a remnant of some refactoring. The ray.rllib.algorithms.ppo path includes the Algorithm class and everything you need.
  3. Tune is meant for that sort of thing. You can also implement stopping logic yourself, but with tune you can explore your HP space, stop on a plateau and many more things! Please check it out.
    Have a look at the basic tune.Trainable example to see how you can retrieve the best Trainable that tune found. That trainable will be an RLlib Algorithm object that you can then use further to evaluate manually.

Cheers

1 Like