How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi all,
I’ve come to realise my understanding of the way certain parameters regarding num_gpu, rollout_workers (as well as cpu and gpu per worker), learner_worker, and evaluation worker is flawed and the documentation I could find did not help me.
So I’m hoping someone can clarify the following points:
All of this is assuming a single gpu is available and a 12core/24 thread cpu on a single machine.
-
For
num_gpu
parameter:Number of GPUs to allocate to the algorithm process
→ What does this mean? What is the algorithm process? What is it responsible and what does it do? -
num_rollout_workers: Am I correct in understanding that any requests for actions will be automatically load-balanced between n number of rollout workers?
2.5: When specifying num_cpu and num_gpu per rollout worker, those resources will be available to each worker to perform the forward/prediction passes when getting get_action?
2.75: If gpu’s are assigned to the worker, they each make a copy of the model in the gpu for the forward pass, thus not really needing cpu’s? -
Is the learner_worker the one that does the training loop? So assigning it cpu+gpu means it will use those resources (if given a gpu, makes a copy of the model in gpu) for training?
3.25: After a training loop, are these resources released?
3.5: Reason for above, if you look at the code below. I assign a total of 1.0 gpu (0.25 for algorithm?, 0.25 for 2 workers, and 0.25 for learner), and 6 cpu (2 cpu for 2 workers, and 2 for the learner). However, tune.run reports utilising only 5.0 cpu and 0.75 gpu. Why is the rest not used? And why is there 5.0 cpu? My guess 4 for workers, and 1 for the driver → in which case how to specify resources for the driver, and what is the difference between the driver and the algorithm cpu? -
What is the evaluation worker? What is it’s purpose? And can anyone provide any explanation/documentation links please.
Sample code below:
import ray
from ray.rllib.env import PolicyServerInput
from ray.rllib.algorithms.ppo import PPOConfig
import numpy as np
import argparse
from gymnasium.spaces import MultiDiscrete, Box
ray.init(num_cpus=9, num_gpus=1, log_to_driver=False, configure_logging=False)
ppo_config = PPOConfig()
parser = argparse.ArgumentParser(description='Optional app description')
parser.add_argument('-ip', type=str, help='IP of this device')
parser.add_argument('-checkpoint', type=str, help='location of checkpoint to restore from')
args = parser.parse_args()
def _input(ioctx):
# We are remote worker, or we are local worker with num_workers=0:
# Create a PolicyServerInput.
if ioctx.worker_index > 0 or ioctx.worker.num_workers == 0:
return PolicyServerInput(
ioctx,
args.ip,
55556 + ioctx.worker_index - (1 if ioctx.worker_index > 0 else 0),
)
# No InputReader (PolicyServerInput) needed.
else:
return None
x = 320
y = 240
# kl_coeff, ->default 0.2
# ppo_config.gamma = 0.01 # vf_loss_coeff used to be 0.01??
# "entropy_coeff": 0.00005,
# "clip_param": 0.1,
ppo_config.gamma = 0.998 # default 0.99
ppo_config.lambda_ = 0.99 # default 1.0???
ppo_config.kl_target = 0.01 # default 0.01
ppo_config.rollout_fragment_length = 128
# ppo_config.train_batch_size = 8500
# ppo_config.train_batch_size = 10000
ppo_config.train_batch_size = 12000
ppo_config.sgd_minibatch_size = 512
# ppo_config.num_sgd_iter = 2 # default 30???
ppo_config.num_sgd_iter = 7 # default 30???
# ppo_config.lr = 3.5e-5 # 5e-5
ppo_config.lr = 9e-5 # 5e-5
ppo_config.model = {
"vf_share_layers": True,
"use_lstm": True,
"max_seq_len": 32,
"lstm_cell_size": 128,
"lstm_use_prev_action": True,
"conv_filters": [
# 240 X 320
[16, [5, 5], 3],
[32, [5, 5], 3],
[64, [5, 5], 3],
[128, [3, 3], 2],
[256, [3, 3], 2],
[512, [3, 3], 2],
],
"conv_activation": "relu",
"post_fcnet_hiddens": [512],
"post_fcnet_activation": "relu"
}
ppo_config.batch_mode = "complete_episodes"
ppo_config.simple_optimizer = True
ppo_config.env = None
ppo_config.observation_space = Box(low=0, high=1, shape=(y, x, 1), dtype=np.float32)
ppo_config.action_space = MultiDiscrete(
[
2, # W
2, # A
2, # S
2, # D
2, # Space
2, # H
2, # J
2, # K
2 # L
]
)
ppo_config.env_config = {
"sleep": True,
'replayOn': False
}
ppo_config.rollouts(num_rollout_workers=2, enable_connectors=False)
ppo_config.offline_data(input_=_input)
ppo_config.framework_str = 'torch'
ppo_config.log_sys_usage = False
ppo_config.compress_observations = True
ppo_config.shuffle_sequences = False
ppo_config.num_gpus = 0.25
ppo_config.num_gpus_per_worker = 0.25
ppo_config.num_cpus_per_worker = 2
ppo_config.num_learner_workers = 1
ppo_config.num_cpus_per_learner_worker = 2
ppo_config.num_gpus_per_learner_worker = 0.5
tempyy = ppo_config.to_dict()
from ray import tune
name = "" + args.checkpoint
print(f"Starting: {name}")
tune.run("PPO",
resume='AUTO',
config=tempyy,
name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean",
max_failures=1,
checkpoint_freq=5, checkpoint_at_end=True)
Image for 3.5: