Question about defining config in Ray > 2.0.0 from 1.0.0

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I was trying to migrate code from ray rllib 1.0.0 to >2.0.0.
I am trying to replace certain parts in the file, climate-cooperation-competition/train_with_rllib.py at 79cdcfa08976c58aa20a6cc0722bc30420615be9 · mila-iqia/climate-cooperation-competition · GitHub

I was wondering if you know which parts of the config should I add the following which is the run_config which was loaded on line 408 of train_with_rllib:

{'saving': {'metrics_log_freq': 100,
  'model_params_save_freq': 1000,
  'basedir': '/tmp',
  'name': 'rice',
  'tag': 'experiments'},
 'trainer': {'num_envs': 20,
  'rollout_fragment_length': 100,
  'train_batch_size': 2000,
  'num_episodes': 100,
  'framework': 'torch',
  'num_workers': 4,
  'num_gpus': 0},
 'env': {'num_discrete_action_levels': 10,
  'negotiation_on': False,
  'source_dir': None},
 'policy': {'regions': {'vf_loss_coeff': 0.1,
   'entropy_coeff_schedule': [[0, 0.5], [1000000, 0.1], [5000000, 0.05]],
   'clip_grad_norm': True,
   'max_grad_norm': 0.5,
   'gamma': 0.92,
   'lr': 0.0005,
   'model': {'custom_model': 'torch_linear',
    'custom_model_config': {'fc_dims': [256, 256]}}}}}

So far, the following code works, but it does not have all the config parameters in it which replaced the trainer definition in line 420 and 307 of train_with_rllib,

# try https://docs.ray.io/en/latest/ray-air/trainer.html 
from ray.air.config import RunConfig, ScalingConfig
from ray.train.rl import RLTrainer

trainer = RLTrainer(
    run_config=RunConfig(stop={"training_iteration": 5}),
    scaling_config=ScalingConfig(num_workers=2, use_gpu=False),
    algorithm="PPO",
    config={
        "env": EnvWrapper,
        "framework": "tf",
        "evaluation_num_workers": 4,
        "evaluation_interval": 1
    },
)

Hi @SY2567, yes there have been a couple of changes since the introduction of Ray 2.0.0.
In regard to the configuration of algorithms this is now handled by a class AlgorithmConfig that is created by the ctor and handles parameters via class methods. You can build from it a configuration dictionary via config.to_dict() or the algorithm (config.build()).
Take a look into the class definition here.
Each specific algorithm adds parameters to the default configuration by inheriting from the AlgorithmConfig, e.g. PPOConfig inherits from AlgorithmConfig and adds some specific parameters regarding the loss coefficients and the minibatch SGD.

For example a single agent setting looks as follows (as a little task to get faimilar with it, reconfigure it for MARL ;)):

from ray.rllib.algorithms.ppo.ppo import PPOConfig
from ray import air, tune

config = (
            PPOConfig().
            .rollouts(
                num_rollout_workers=4,
                rollout_fragment_length=100,
            )
            .framework(
                framework="torch",
            )
            .training(
                train_batch_size=2000,
                gamma=0.92,
                lr=5e-4,
                # PPO specific parameters.
                grad_clip=0.5,
                "model": {
                    "custom_model": "torch_linear",
                }
            )
            .environment(
                env=Enwrapper,
                env_config= {...}
            )
)

stop = {        
        "num_episodes": 200,
}

ray.init() 
tuner = tune.Tuner(
    "PPO",
    # This is important, it needs a dict:
    param_space=config.to_dict(),
    run_config=air.RunConfig(
        stop=stop,
        verbose=1,
        local_dir="~/ray_results/TestNewRay",
        checkpoint_config=air.CheckpointConfig(
            checkpoint_frequency=10,
            checkpoint_at_end=True,
        )
    )
)
tuner.fit()

The example you show will be the future of RLlib, where RLlib will use trainers configured in Ray Train and condense more into the architecture of the algorithms and the rollout distribution.

1 Like