Trying to evaluate and compare performance when training an RL model using Ray Tune using one AWS instance that contains 1GPU and another instance that contains 4GPUs. However, it appears training gets stuck on the first sample and doesn’t finish. Console contains no warnings or errors.
Same code works fine when using only CPUs.
== Status ==
Current time: 2023-01-23 00:44:50 (running for 01:25:53.07)
Memory usage on this node: 16.1/59.9 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 4.000: None | Iter 1.000: None
Resources requested: 4.0/8 CPUs, 1.0/1 GPUs, 0.0/34.66 GiB heap, 0.0/17.33 GiB objects (0.0/1.0 accelerator_type:V100)
Result logdir: /home/ubuntu/ray_results/AIRDQN_2023-01-22_23-18-57
Number of trials: 2/40 (1 PENDING, 1 RUNNING)
+-----------------+----------+---------------------+-------------------+-------------+------------------------+-----------------------+----------------------+--------------------+
| Trial name | status | loc | batch_mode | lr | model/fcnet_activati | model/fcnet_hiddens | observation_filter | train_batch_size |
| | | | | | on | | | |
|-----------------+----------+---------------------+-------------------+-------------+------------------------+-----------------------+----------------------+--------------------|
| AIRDQN_2574d0b0 | RUNNING | 10.101.11.168:15888 | complete_episodes | 2.93357e-06 | elu | (128, 8) | MeanStdFilter | 5000 |
| AIRDQN_305150c6 | PENDING | | truncate_episodes | 5.45804e-05 | relu | (64, 16) | NoFilter | 5000 |
+-----------------+----------+---------------------+-------------------+-------------+------------------------+-----------------------+----------------------+--------------------+
Settings:
RL algorithm: ray.rllib.algorithms.dqn.DQN
Input data type: offline data
Search algorithm: ray.tune.search.optuna.OptunaSearch
Scheduler: ray.tune.schedulers.ASHAScheduler
Off-policy evaluation method: ray.rllib.offline.estimators.DoublyRobust
The following shows the code used to create and configure RLTrainer along with other artifacts used in model training and off-policy evaluation process. Do I need to configure the ScalingConfig object differently for this to work with one and multiple GPUs other than simply setting use_gpu=True?
def train(self):
# load train data
train_dataset = ray.data.read_json(self.offline_data_info['train_dir'])
# create trainer
trainer = self.create_trainer(train_dataset=train_dataset)
# search algorithm
search_algo = OptunaSearch(
metric='evaluation/off_policy_estimator/doubly_robust_fitted_q_eval/v_target',
mode='max'
)
# scheduler
scheduler = ASHAScheduler(
metric='evaluation/off_policy_estimator/doubly_robust_fitted_q_eval/v_target',
mode='max',
time_attr='training_iteration',
max_t=5,
grace_period=1
)
# create tuner
tuner = Tuner(
# trainer
trainer,
# create tune configuration
tune_config=self.create_tune_config(
search_algo=search_algo,
scheduler=scheduler
),
# hyper-parameters
param_space=self.create_param_space(),
# save checkpoint - run configuration doesn't work in Ray Air, use _tuner_kwargs to specify checkpoint settings
_tuner_kwargs=dict(checkpoint_at_end=True),
)
# train models
result_grid = tuner.fit()
# convert content in result grid to pandas dataframe
df_result = self.create_results_dataframe(result_grid=result_grid)
return df_result
def create_trainer(self, train_dataset):
return RLTrainer(
# run config
run_config=RunConfig(
stop=dict(training_iteration=5),
verbose=3
),
# scaling config
scaling_config=ScalingConfig(
use_gpu=True
),
# train dataset
datasets=dict(train=train_dataset),
# algorithm
algorithm='DQN',
# config
config=dict(
action_space=self.action_space,
observation_space=self.observation_space,
framework='torch',
evaluation_interval=1,
evaluation_duration=10000,
evaluation_duration_unit='episodes',
evaluation_parallel_to_training=False,
evaluation_num_workers=1,
evaluation_config=dict(input=self.offline_data_info['test_dir']),
# off-policy estimation
off_policy_estimation_methods=dict(
# doubly robust method
doubly_robust_fitted_q_eval=dict(
type=DoublyRobust,
q_model_config=dict(
type=FQETorchModel,
model=[64]
)
)
)
)
)
def create_tune_config(self, search_algo, scheduler):
return tune.TuneConfig(
num_samples=40,
search_alg=search_algo,
scheduler=scheduler
)
def create_param_space(self):
return dict(
lr=tune.loguniform(1e-6, 1e-3),
observation_filter=tune.choice(['NoFilter', 'MeanStdFilter']),
batch_mode=tune.choice(['truncate_episodes', 'complete_episodes']),
train_batch_size=tune.choice([5000]),
model=dict(
fcnet_activation=tune.choice(['relu', 'elu']),
fcnet_hiddens=tune.choice(self.network_configurations)
)
)
Thanks,
Stefan