OS: Ubuntu 20.04.6 LTS
Server: Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10GHz 72 processes
GPU: Tesla K80 (old one from 2014 year)
Python: 3.11.8
Ray: 2.37.0
What config settings should I set for fastest training PPO algorithm with dataset of ~600k rows and 300 columns? Now it takes a lot of hours/days, one episode can takes an hour.
I tried many ways, using only gpu or cpu, but not found big difference.
My config:
config = (
PPOConfig()
.framework("torch")
.api_stack(
# I set false, because of error
enable_rl_module_and_learner=False,
enable_env_runner_and_connector_v2=False,
)
.environment(
env=select_env,
env_config={},
)
# .evaluation(
# evaluation_num_env_runners=1,
# evaluation_interval=1,
# # Run evaluation parallel to training to speed up the example.
# evaluation_parallel_to_training=False,
# )
.env_runners(
num_env_runners=10,# 2
num_cpus_per_env_runner=1,
num_envs_per_env_runner=10, # 3
num_gpus_per_env_runner=0,
)
.resources(
num_gpus=1,
# num_cpus_per_worker=1,
# num_gpus_per_worker=0,
)
.learners(
# num_learners=1,
# num_cpus_per_learner=2,
# um_gpus_per_learner=0,
)
.training(
#train_batch_size_per_learner=256
train_batch_size=5000,
)
.experimental(
# _disable_preprocessor_api=False,
# _enable_new_api_stack=True
)
)
Config that ChatGPT advised for my case, but I got an error “TypeError: AlgorithmConfig.env_runners() got an unexpected keyword argument ‘num_workers’”.
Who knows what is rollouts here? Absolutely no docs on ray.io about it.
config = PPOConfig() \
.environment(your_env) \
.rollouts(
num_workers=60, # Adjust based on testing
num_envs_per_worker=4, # More parallel environments per worker
rollout_fragment_length=400 # Balance throughput vs overhead
) \
.training(
train_batch_size=50000, # Larger batch size for PPO
sgd_minibatch_size=4000, # Efficient use of GPU during SGD
num_sgd_iter=10, # Number of epochs per update
lr=0.0003 # Learning rate, can tune for your environment
) \
.resources(
num_gpus=1 # Use the GPU
) \
.evaluation(
evaluation_num_workers=6 # Parallel evaluation workers
)