How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
Hi,
Having used other RL frameworks before, I have just started playing with RLlib, which I find amazing in its scope and scalability. However, I still find the documentation somehow confusing. For example, I am trying to find API reference for the ray,tune.run() method used on various of the examples, but for some reason I cannot find it in the current docs. Am I missing something here?
Other question is that I am confusing between those statements:
from ray.rllib.agents.ppo import PPOTrainer
import ray.rllib.algorithms.ppo as ppo
What is the exact difference? From seeing the base code, it seems PPOTrainer is an instance of PPO and it gets the default configuration. But apart from that, I do not see much difference, also it seems they are used indistinctly in the various examples.
The final question is that I want to run a training and automatically stop it once the mean rewards gets to a certain threshold:
- Can this only be implemented by making use of the Tune library and setting a Tuner?
- In that case, how can I retrieve the trained model for evaluation, and how do I perform such evaluation in that case (for example, I want to evaluate the model over 100 fresh episodes of the given environment, and get the mean_rewards over those 100 episodes)
At the moment I played with this little example, which worked fine:
import ray
from ray.rllib.agents.ppo import PPOTrainer
from ray import tune
config = {
"env": "CartPole-v0",
# Change the following line to `“framework”: “tf”` to use tensorflow
"framework": "torch",
"model": {
"use_lstm": True,
},
}
stop = {"episode_reward_mean": 195}
ray.shutdown()
ray.init(
num_cpus=8,
include_dashboard=False,
ignore_reinit_error=True,
log_to_driver=False,
)
# execute training
analysis = ray.tune.run(
"PPO",
name = "PPO-lstm",
config=config,
stop=stop,
checkpoint_at_end=True,
)
But after getting the analysis
results, I ignore on ow to continue from there, meaning, how can I retrieve the trained model in case I want to continue a training? Or how can I evaluate the trained model). It is not clear to me how to retrieve the trained model from the analysis
result.