Use Policy_Trainer with TensorBoard

mannyv · November 11, 2021, 9:09pm

Then you can leave it out and it will train forever

Denys_Ashikhin · November 12, 2021, 4:05pm

Another question, if I pause the training, and load it again, a new folder is created with the values continuing → is there a way to merge them into the same train? thingy so its a single color and continous?

Denys_Ashikhin · November 12, 2021, 4:08pm

I want to be able to do something like this:

counter = 1

while True:
     result = tune.run(“PPO”, config=config, stop={counter % 1 == 0}, name=“run_name”)
     result.save(checkpoint_path)
     counter += 1

Is that possible with tune?

mannyv · November 12, 2021, 4:26pm

@Denys_Ashikhin,

You could do that but you do not really need to. tune.run has some checkpointing parameters that you could use.

https://docs.ray.io/en/latest/tune/api_docs/execution.html?highlight=tune.run#tune-run

keep_checkpoints_num (int) – Number of checkpoints to keep. A value of None keeps all checkpoints. Defaults to None. If set, need to provide checkpoint_score_attr.

checkpoint_score_attr (str) – Specifies by which attribute to rank the best checkpoint. Default is increasing order. If attribute starts with min- it will rank attribute in decreasing order, i.e. min-validation_loss.

checkpoint_freq (int) – How many training iterations between checkpoints. A value of 0 (default) disables checkpointing. This has no effect when using the Functional Training API.

checkpoint_at_end (bool) – Whether to checkpoint at the end of the experiment regardless of the checkpoint_freq. Default is False. This has no effect when using the Functional Training API

mannyv · November 12, 2021, 4:29pm

@Denys_Ashikhin,

What version of ray are you currently using?

There is a bug with rnn sequencing in the latest release.

You can avoid it with these settings (assuming you are not trying to train with multi-gpu).

Denys_Ashikhin · November 12, 2021, 4:34pm

Screenshot because I am remoting into my training machine.
I am using only a single gpu to train on a single machine (different machines for collecting sample but I don’t think that matters?)

Do I need to set simple optimised in my case?

P.S.

tune.run(trainer, name=args.checkpoint, keep_checkpoints_num = None, checkpoint_score_attr = "episode_reward_mean", checkpoint_freq = 1, checkpoint_at_end = True)
Would save it in a file under ~ray_tune/args.checkpoints. And if I need to continue training, I would just pass resume=True?

mannyv · November 12, 2021, 4:47pm

@Denys_Ashikhin,

Personally I would until this issue is closed: [Bug] [rllib] RNN sequencing is incorrect · Issue #19976 · ray-project/ray · GitHub.

The simple_optimizer should still be able to use 1 gpu just fine.

mannyv · November 12, 2021, 4:49pm

That looks good to me.

Resume is for if a training failed for some reason.

Restore is to continue training by reloading from a user specified checkpoint.

Denys_Ashikhin · November 12, 2021, 4:51pm

DEFAULT_CONFIG = with_common_config({
    # Should use a critic as a baseline (otherwise don't use value baseline;
    # required for using GAE).
    "use_critic": True,
    # If true, use the Generalized Advantage Estimator (GAE)
    # with a value function, see https://arxiv.org/pdf/1506.02438.pdf.
    "use_gae": True,
    # The GAE (lambda) parameter.
    "lambda": 0.995,
    # Initial coefficient for KL divergence.
    "kl_coeff": 0.2,
    # Size of batches collected from each worker.
    "rollout_fragment_length": 64,
    # Number of timesteps collected for each SGD round. This defines the size
    # of each SGD epoch.
    "train_batch_size": 7168,
    # Total SGD batch size across all devices for SGD. This defines the
    # minibatch size within each epoch.
    "sgd_minibatch_size": 128,
    # Number of SGD iterations in each outer loop (i.e., number of epochs to
    # execute per train batch).
    "num_sgd_iter": 10,
    # Whether to shuffle sequences in the batch when training (recommended).
    "shuffle_sequences": False,
    # Stepsize of SGD.
    "lr": 3e-5,
    # Learning rate schedule.
    "lr_schedule": None,
    # Coefficient of the value function loss. IMPORTANT: you must tune this if
    # you set vf_share_layers=True inside your model's config.
    "vf_loss_coeff": 1.25,
    "model": {
        # Share layers for value function. If you set this to True, it's
        # important to tune vf_loss_coeff.
        "vf_share_layers": False,

        "fcnet_hiddens": [1024, 1024],
        "fcnet_activation": "relu",
        "use_lstm": True,
        "max_seq_len": 16,
        "lstm_cell_size": 512,
        "lstm_use_prev_action": False
    },
    # Coefficient of the entropy regularizer.
    "entropy_coeff": 0.00005,
    # Decay schedule for the entropy regularizer.
    "entropy_coeff_schedule": None,
    # PPO clip parameter.
    "clip_param": 0.3,
    # Clip param for the value function. Note that this is sensitive to the
    # scale of the rewards. If your expected V is large, increase this.
    "vf_clip_param": 30.0,
    # If specified, clip the global norm of gradients by this amount.
    "grad_clip": None,
    # Target value for KL divergence.
    "kl_target": 0.02,
    # Whether to rollout "complete_episodes" or "truncate_episodes".
    "batch_mode": "complete_episodes",
    # Which observation filter to apply to the observation.
    "observation_filter": "NoFilter",
    # Uses the sync samples optimizer instead of the multi-gpu one. This is
    # usually slower, but you might want to try it if you run into issues with
    # # the default optimizer.
    "simple_optimizer": True,
    #"reuse_actors": True,
    "num_gpus": 1,
    # Use the connector server to generate experiences.
    "input": (
        lambda ioctx: PolicyServerInput(ioctx, args.ip, 55556)
    ),
    # Use a single worker process to run the server.
    "num_workers": 0,
    # Disable OPE, since the rollouts are coming from online clients.
    "input_evaluation": [],
    # "callbacks": MyCallbacks,
    "env_config": {"sleep": True},
    "framework": "tf",
    # "eager_tracing": True,
    "explore": True,
    "create_env_on_driver": False,
    "log_sys_usage": False,
    "compress_observations": True
})

allianceId = 27
heroId = 72
localHeroId = 100
itemId = 70
localItemId = 10
x = 8
y = 5
DEFAULT_CONFIG["env_config"]["observation_space"] = ......
DEFAULT_CONFIG["env_config"]["action_space"] = ....

ray.init(log_to_driver=False)

trainer = PPOTrainer(config=DEFAULT_CONFIG, env=RandomEnv)

tune.run(trainer, name=args.checkpoint, keep_checkpoints_num = None, checkpoint_score_attr = "episode_reward_mean", checkpoint_freq = 1, checkpoint_at_end = True)

Is that good? Note I added the simple in the overall trainer config

Denys_Ashikhin · November 12, 2021, 5:00pm

So something is a little off:

Lars_Simon_Zehnder · November 12, 2021, 5:04pm

@Denys_Ashikhin ,

it looks like you did not provide tune.run() with a Trainable or a registered Trainer. You must specify this either by using an already registered Trainer like

import ray
from ray import tune

ray.init()
tune.run(
    "PPO",
    stop={"episode_reward_mean": 200},
    config={
        "env": "CartPole-v0",
        "num_gpus": 0,
        "num_workers": 1,
        "lr": tune.grid_search([0.01, 0.001, 0.0001]),
    },
)

or by creating a Trainable by using build_trainer() from ray.rllib.agents.trainer_template like:

MyTrainer = build_trainer(
    name="MyPolicy",
    default_policy=MyPolicy,
)

my_trainer = MyTrainer(...)
...
tune.run(my_trainer, ...)

Hope this helps

Denys_Ashikhin · November 12, 2021, 7:56pm

Thanks everyone, seems like I managed to get it working with everyone’s help.

Just one question, why is it printing twice?

Lars_Simon_Zehnder · November 12, 2021, 9:47pm

This is simply a mechanical thing every couple of seconds tune prints. If it takes longer, you get more prints (all identical in the most cases), if it is faster you get less. See this answer to my question for more infos.

Denys_Ashikhin · November 13, 2021, 5:48pm

I think you forgot to link the question

Topic		Replies	Views
[Tune Class API + PyTorch] Possible to add more custom scalars+weights+biases to Tensorboard events file? Ray Tune	11	2661	March 31, 2021
[RLlib] Writing to tensorboard during custom evaluation RLlib	3	1835	February 19, 2021
Tune.run is not emitting tensorboard log file Ray Tune	0	266	August 11, 2021
How to log Render to tensorboard? RLlib	9	2484	July 22, 2021
Ray Actor and TensorFlow Logging Ray Core	9	484	June 24, 2021

Use Policy_Trainer with TensorBoard

Related topics