RL training stuck when using Ray Tune and GPU

Trying to evaluate and compare performance when training an RL model using Ray Tune using one AWS instance that contains 1GPU and another instance that contains 4GPUs. However, it appears training gets stuck on the first sample and doesn’t finish. Console contains no warnings or errors.

Same code works fine when using only CPUs.

== Status ==
Current time: 2023-01-23 00:44:50 (running for 01:25:53.07)
Memory usage on this node: 16.1/59.9 GiB
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 4.000: None | Iter 1.000: None
Resources requested: 4.0/8 CPUs, 1.0/1 GPUs, 0.0/34.66 GiB heap, 0.0/17.33 GiB objects (0.0/1.0 accelerator_type:V100)
Result logdir: /home/ubuntu/ray_results/AIRDQN_2023-01-22_23-18-57
Number of trials: 2/40 (1 PENDING, 1 RUNNING)
+-----------------+----------+---------------------+-------------------+-------------+------------------------+-----------------------+----------------------+--------------------+
| Trial name      | status   | loc                 | batch_mode        |          lr | model/fcnet_activati   | model/fcnet_hiddens   | observation_filter   |   train_batch_size |
|                 |          |                     |                   |             | on                     |                       |                      |                    |
|-----------------+----------+---------------------+-------------------+-------------+------------------------+-----------------------+----------------------+--------------------|
| AIRDQN_2574d0b0 | RUNNING  | 10.101.11.168:15888 | complete_episodes | 2.93357e-06 | elu                    | (128, 8)              | MeanStdFilter        |               5000 |
| AIRDQN_305150c6 | PENDING  |                     | truncate_episodes | 5.45804e-05 | relu                   | (64, 16)              | NoFilter             |               5000 |
+-----------------+----------+---------------------+-------------------+-------------+------------------------+-----------------------+----------------------+--------------------+

Settings:
RL algorithm: ray.rllib.algorithms.dqn.DQN
Input data type: offline data
Search algorithm: ray.tune.search.optuna.OptunaSearch
Scheduler: ray.tune.schedulers.ASHAScheduler
Off-policy evaluation method: ray.rllib.offline.estimators.DoublyRobust

The following shows the code used to create and configure RLTrainer along with other artifacts used in model training and off-policy evaluation process. Do I need to configure the ScalingConfig object differently for this to work with one and multiple GPUs other than simply setting use_gpu=True?

def train(self):

    # load train data
    train_dataset = ray.data.read_json(self.offline_data_info['train_dir'])

    # create trainer
    trainer = self.create_trainer(train_dataset=train_dataset)

    # search algorithm
    search_algo = OptunaSearch(
        metric='evaluation/off_policy_estimator/doubly_robust_fitted_q_eval/v_target',
        mode='max'
    )

    # scheduler
    scheduler = ASHAScheduler(
        metric='evaluation/off_policy_estimator/doubly_robust_fitted_q_eval/v_target',
        mode='max',
        time_attr='training_iteration',
        max_t=5,
        grace_period=1
    )

    # create tuner
    tuner = Tuner(

        # trainer
        trainer,

        # create tune configuration
        tune_config=self.create_tune_config(
            search_algo=search_algo,
            scheduler=scheduler
        ),

        # hyper-parameters
        param_space=self.create_param_space(),

        # save checkpoint - run configuration doesn't work in Ray Air, use _tuner_kwargs to specify checkpoint settings
        _tuner_kwargs=dict(checkpoint_at_end=True),
    )

    # train models
    result_grid = tuner.fit()

    # convert content in result grid to pandas dataframe
    df_result = self.create_results_dataframe(result_grid=result_grid)

    return df_result

def create_trainer(self, train_dataset):

    return RLTrainer(

        # run config
        run_config=RunConfig(
            stop=dict(training_iteration=5),
            verbose=3
        ),

        # scaling config
        scaling_config=ScalingConfig(
            use_gpu=True
        ),

        # train dataset
        datasets=dict(train=train_dataset),

        # algorithm
        algorithm='DQN',

        # config
        config=dict(
            action_space=self.action_space,
            observation_space=self.observation_space,
            framework='torch',
            evaluation_interval=1,
            evaluation_duration=10000,
            evaluation_duration_unit='episodes',
            evaluation_parallel_to_training=False,
            evaluation_num_workers=1,
            evaluation_config=dict(input=self.offline_data_info['test_dir']),

            # off-policy estimation
            off_policy_estimation_methods=dict(

                # doubly robust method
                doubly_robust_fitted_q_eval=dict(
                    type=DoublyRobust,
                    q_model_config=dict(
                        type=FQETorchModel,
                        model=[64]
                    )
                )
            )
        )
    )

def create_tune_config(self, search_algo, scheduler):

    return tune.TuneConfig(
        num_samples=40,
        search_alg=search_algo,
        scheduler=scheduler
    )

def create_param_space(self):

    return dict(
        lr=tune.loguniform(1e-6, 1e-3),
        observation_filter=tune.choice(['NoFilter', 'MeanStdFilter']),
        batch_mode=tune.choice(['truncate_episodes', 'complete_episodes']),
        train_batch_size=tune.choice([5000]),
        model=dict(
            fcnet_activation=tune.choice(['relu', 'elu']),
            fcnet_hiddens=tune.choice(self.network_configurations)
        )
    )

Thanks,
Stefan

Interesting… is it possible to reproduce this on colab?

Also cc @arturn , @kourosh

This is a proprietary environment, but I can try to replicate it using a classic Open AI Gym environment. In the meantime, can you tell me if there is anything obvious that I’m doing wrong?

My understanding from the Ray Tune documentation and other comments is that it should be sufficient to create an RLTrainer with ScalingConfig(use_gpu=True) and Ray Tune automatically leverages existing GPU(s) during model training. But maybe I’m missing something.

Hi @steff , I don’t see anything obvious that may be wrong. Can you share a repro script? Does this happen also when you build the algorithm via algo = config.build() and do algo.train()?