Then you can leave it out and it will train forever
Another question, if I pause the training, and load it again, a new folder is created with the values continuing → is there a way to merge them into the same train? thingy so its a single color and continous?
I want to be able to do something like this:
counter = 1
while True:
result = tune.run(“PPO”, config=config, stop={counter % 1 == 0}, name=“run_name”)
result.save(checkpoint_path)
counter += 1
Is that possible with tune?
You could do that but you do not really need to. tune.run has some checkpointing parameters that you could use.
https://docs.ray.io/en/latest/tune/api_docs/execution.html?highlight=tune.run#tune-run
keep_checkpoints_num (int) – Number of checkpoints to keep. A value of None keeps all checkpoints. Defaults to None. If set, need to provide checkpoint_score_attr.
checkpoint_score_attr (str) – Specifies by which attribute to rank the best checkpoint. Default is increasing order. If attribute starts with min- it will rank attribute in decreasing order, i.e. min-validation_loss.
checkpoint_freq (int) – How many training iterations between checkpoints. A value of 0 (default) disables checkpointing. This has no effect when using the Functional Training API.
checkpoint_at_end (bool) – Whether to checkpoint at the end of the experiment regardless of the checkpoint_freq. Default is False. This has no effect when using the Functional Training API
What version of ray are you currently using?
There is a bug with rnn sequencing in the latest release.
You can avoid it with these settings (assuming you are not trying to train with multi-gpu).
Screenshot because I am remoting into my training machine.
I am using only a single gpu to train on a single machine (different machines for collecting sample but I don’t think that matters?)
Do I need to set simple optimised in my case?
P.S.
tune.run(trainer, name=args.checkpoint, keep_checkpoints_num = None, checkpoint_score_attr = "episode_reward_mean", checkpoint_freq = 1, checkpoint_at_end = True)
Would save it in a file under ~ray_tune/args.checkpoints. And if I need to continue training, I would just pass resume=True?
Personally I would until this issue is closed: [Bug] [rllib] RNN sequencing is incorrect · Issue #19976 · ray-project/ray · GitHub.
The simple_optimizer should still be able to use 1 gpu just fine.
That looks good to me.
Resume is for if a training failed for some reason.
Restore is to continue training by reloading from a user specified checkpoint.
DEFAULT_CONFIG = with_common_config({
# Should use a critic as a baseline (otherwise don't use value baseline;
# required for using GAE).
"use_critic": True,
# If true, use the Generalized Advantage Estimator (GAE)
# with a value function, see https://arxiv.org/pdf/1506.02438.pdf.
"use_gae": True,
# The GAE (lambda) parameter.
"lambda": 0.995,
# Initial coefficient for KL divergence.
"kl_coeff": 0.2,
# Size of batches collected from each worker.
"rollout_fragment_length": 64,
# Number of timesteps collected for each SGD round. This defines the size
# of each SGD epoch.
"train_batch_size": 7168,
# Total SGD batch size across all devices for SGD. This defines the
# minibatch size within each epoch.
"sgd_minibatch_size": 128,
# Number of SGD iterations in each outer loop (i.e., number of epochs to
# execute per train batch).
"num_sgd_iter": 10,
# Whether to shuffle sequences in the batch when training (recommended).
"shuffle_sequences": False,
# Stepsize of SGD.
"lr": 3e-5,
# Learning rate schedule.
"lr_schedule": None,
# Coefficient of the value function loss. IMPORTANT: you must tune this if
# you set vf_share_layers=True inside your model's config.
"vf_loss_coeff": 1.25,
"model": {
# Share layers for value function. If you set this to True, it's
# important to tune vf_loss_coeff.
"vf_share_layers": False,
"fcnet_hiddens": [1024, 1024],
"fcnet_activation": "relu",
"use_lstm": True,
"max_seq_len": 16,
"lstm_cell_size": 512,
"lstm_use_prev_action": False
},
# Coefficient of the entropy regularizer.
"entropy_coeff": 0.00005,
# Decay schedule for the entropy regularizer.
"entropy_coeff_schedule": None,
# PPO clip parameter.
"clip_param": 0.3,
# Clip param for the value function. Note that this is sensitive to the
# scale of the rewards. If your expected V is large, increase this.
"vf_clip_param": 30.0,
# If specified, clip the global norm of gradients by this amount.
"grad_clip": None,
# Target value for KL divergence.
"kl_target": 0.02,
# Whether to rollout "complete_episodes" or "truncate_episodes".
"batch_mode": "complete_episodes",
# Which observation filter to apply to the observation.
"observation_filter": "NoFilter",
# Uses the sync samples optimizer instead of the multi-gpu one. This is
# usually slower, but you might want to try it if you run into issues with
# # the default optimizer.
"simple_optimizer": True,
#"reuse_actors": True,
"num_gpus": 1,
# Use the connector server to generate experiences.
"input": (
lambda ioctx: PolicyServerInput(ioctx, args.ip, 55556)
),
# Use a single worker process to run the server.
"num_workers": 0,
# Disable OPE, since the rollouts are coming from online clients.
"input_evaluation": [],
# "callbacks": MyCallbacks,
"env_config": {"sleep": True},
"framework": "tf",
# "eager_tracing": True,
"explore": True,
"create_env_on_driver": False,
"log_sys_usage": False,
"compress_observations": True
})
allianceId = 27
heroId = 72
localHeroId = 100
itemId = 70
localItemId = 10
x = 8
y = 5
DEFAULT_CONFIG["env_config"]["observation_space"] = ......
DEFAULT_CONFIG["env_config"]["action_space"] = ....
ray.init(log_to_driver=False)
trainer = PPOTrainer(config=DEFAULT_CONFIG, env=RandomEnv)
tune.run(trainer, name=args.checkpoint, keep_checkpoints_num = None, checkpoint_score_attr = "episode_reward_mean", checkpoint_freq = 1, checkpoint_at_end = True)
Is that good? Note I added the simple in the overall trainer config
it looks like you did not provide tune.run()
with a Trainable
or a registered Trainer
. You must specify this either by using an already registered Trainer
like
import ray
from ray import tune
ray.init()
tune.run(
"PPO",
stop={"episode_reward_mean": 200},
config={
"env": "CartPole-v0",
"num_gpus": 0,
"num_workers": 1,
"lr": tune.grid_search([0.01, 0.001, 0.0001]),
},
)
or by creating a Trainable
by using build_trainer()
from ray.rllib.agents.trainer_template
like:
MyTrainer = build_trainer(
name="MyPolicy",
default_policy=MyPolicy,
)
my_trainer = MyTrainer(...)
...
tune.run(my_trainer, ...)
Hope this helps
Thanks everyone, seems like I managed to get it working with everyone’s help.
Just one question, why is it printing twice?
This is simply a mechanical thing every couple of seconds tune
prints. If it takes longer, you get more prints (all identical in the most cases), if it is faster you get less. See this answer to my question for more infos.
I think you forgot to link the question