Can't Create Config from PPOConfig

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I’ve been stuck on this for a while now. I keep receiving the error message that something is wrong with my config file, but I have no idea what it could be, especially since I don’t modify PPOConfig. I’ve attached the error message as well as my code.

Setting `exploration_config={}` because you set `_enable_rl_module_api=True`. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the `forward_exploration` method of the RLModule at hand. On configs that have a default exploration config, this must be done with `config.exploration_config={}`.
2023-08-28 19:15:21,931	WARNING algorithm_config.py:656 -- Cannot create PPOConfig from given `config_dict`! Property __stdout_file__ not supported.
2023-08-28 19:15:21,955	WARNING algorithm_config.py:2558 -- Setting `exploration_config={}` because you set `_enable_rl_module_api=True`. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the `forward_exploration` method of the RLModule at hand. On configs that have a default exploration config, this must be done with `config.exploration_config={}`.
/home/tails/.conda/envs/ray/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py:484: RayDeprecationWarning: This API is deprecated and may be removed in future Ray releases. You could suppress this warning by setting env variable PYTHONWARNINGS="ignore::DeprecationWarning"
`UnifiedLogger` will be removed in Ray 2.7.
  return UnifiedLogger(config, logdir, loggers=None)
/home/tails/.conda/envs/ray/lib/python3.9/site-packages/ray/tune/logger/unified.py:53: RayDeprecationWarning: This API is deprecated and may be removed in future Ray releases. You could suppress this warning by setting env variable PYTHONWARNINGS="ignore::DeprecationWarning"
The `JsonLogger interface is deprecated in favor of the `ray.tune.json.JsonLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
/home/tails/.conda/envs/ray/lib/python3.9/site-packages/ray/tune/logger/unified.py:53: RayDeprecationWarning: This API is deprecated and may be removed in future Ray releases. You could suppress this warning by setting env variable PYTHONWARNINGS="ignore::DeprecationWarning"
The `CSVLogger interface is deprecated in favor of the `ray.tune.csv.CSVLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
/home/tails/.conda/envs/ray/lib/python3.9/site-packages/ray/tune/logger/unified.py:53: RayDeprecationWarning: This API is deprecated and may be removed in future Ray releases. You could suppress this warning by setting env variable PYTHONWARNINGS="ignore::DeprecationWarning"
The `TBXLogger interface is deprecated in favor of the `ray.tune.tensorboardx.TBXLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
2023-08-28 19:15:42,121	WARNING algorithm_config.py:2558 -- Setting `exploration_config={}` because you set `_enable_rl_module_api=True`. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the `forward_exploration` method of the RLModule at hand. On configs that have a default exploration config, this must be done with `config.exploration_config={}`.
2023-08-28 19:15:42,161	WARNING deprecation.py:50 -- DeprecationWarning: `ValueNetworkMixin` has been deprecated. This will raise an error in the future!
2023-08-28 19:15:42,161	WARNING deprecation.py:50 -- DeprecationWarning: `LearningRateSchedule` has been deprecated. This will raise an error in the future!
2023-08-28 19:15:42,161	WARNING deprecation.py:50 -- DeprecationWarning: `EntropyCoeffSchedule` has been deprecated. This will raise an error in the future!
2023-08-28 19:15:42,161	WARNING deprecation.py:50 -- DeprecationWarning: `KLCoeffMixin` has been deprecated. This will raise an error in the future!
/home/tails/.conda/envs/ray/lib/python3.9/site-packages/gymnasium/envs/registration.py:481: UserWarning: WARN: The environment creator metadata doesn't include `render_modes`, contains: ['render.modes']
  logger.warn(
202
import click
import random
from ray import tune, air
import click
import random
import gymnasium as gym
import ray.rllib.algorithms.ppo as ppo
from spr_rl.envs.spr_env import SprEnv
import os
import inspect
import numpy as np
from ray.tune.registry import register_env
import ray.air
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.policy.policy import PolicySpec
from ray.tune.stopper import MaximumIterationStopper
from ray.rllib.algorithms.algorithm import Algorithm


os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'


# Click decorators
# TODO: Add testing flag (timestamp and seed). Already set in params
@click.command()
@click.argument('network', type=click.Path(exists=True))
@click.argument('simulator_config', type=click.Path(exists=True))
@click.argument('services', type=click.Path(exists=True))
@click.argument('training_duration', type=int)
@click.option('-s', '--seed', type=int, help="Set the agent's seed", default=None)
@click.option('-t', '--test', help="Path to test timestamp and seed", default=None)
@click.option('-a', '--append_test', help="test after training", is_flag=True)
@click.option('-m', '--model_path', help="path to a model zip file", default=None)
@click.option('-ss', '--sim-seed', type=int, help="simulator seed", default=None)
@click.option('-b', '--best', help="Select the best agent", is_flag=True)
def main(network, simulator_config, services, training_duration,
         seed, test, append_test, model_path, sim_seed, best):
    """
    SPR-RL DRL Scaling and Placement main executable
    """


    # Get or set a seed

    if seed is None:
        seed = random.randint(0, 9999)

    # Seed random generators
    np.random.seed(seed)
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
    os.environ["CUDA_VISIBLE_DEVICES"] = ""

    print(f"Creating agent with seed: {seed}")
    print(f"Using network: {network}")

    param_config = {
        "seed": seed,
        "sim_config": simulator_config,
        "network": network,
        "services": services,
        "training_duration": training_duration,
        "test_mode": test,
        "sim_seed": sim_seed,
        "best": best
    }

    # settings used for both stable baselines and rllib
    env_name = "SprEnv-v0"
    train_steps = 10000
    learning_rate = 1e-3
    save_dir = "saved_models"
    def register_custom_env():
        gym.envs.register(
            id='SprEnv-v0',  # Change this ID to your preference
            entry_point='spr_rl.envs:SprEnv',  # Specify the module and class name
            kwargs={'config': param_config}
        )

    register_custom_env()

    # Register the environment with the modified config
    register_env(
        env_name,
        lambda config: (
            (SprEnv(param_config))
        )
    )

    ray.init(
        num_cpus=2,  # change to your available number of CPUs
        include_dashboard=False,
        ignore_reinit_error=True,
        log_to_driver=False,
    )

    config = (
        PPOConfig()
        .environment(env="SprEnv-v0")
        # Here, we configure all agents to share the same policy.

        # RLlib needs +1 CPU than configured below (for the driver/traininer?)
        .resources(num_cpus_per_worker=1)
        .rollouts(num_rollout_workers=1)
    )

    # Create the Trainer/Tuner and define how long to train
    tuner = ray.tune.Tuner(
        "PPO",
        run_config=ray.air.RunConfig(
            # Save the training progress and checkpoints locally under the specified subfolder.
            storage_path="./results_rllib",
            # Control training length by setting the number of iterations. 1 iter = 4000 time steps by default.
            stop=MaximumIterationStopper(max_iter=10),
            checkpoint_config=ray.air.CheckpointConfig(checkpoint_at_end=True),
        ),
        param_space=config,
    )

    # Run training and save the result
    result_grid = tuner.fit()
    print(result_grid)

    best_result = result_grid.get_best_result(metric="episode_reward_mean", mode="max")
    print(best_result)
    print(best_result)
    ppo = Algorithm.from_checkpoint(best_result.checkpoint)
    env = gym.make("SprEnv-v0")
    obs, info = env.reset()
    done = False

    # run one episode with the trained model
    while not done:
        action = ppo.compute_single_action(obs)
        obs, reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated


if __name__ == "__main__":
    network = "inputs/networks/interroute-in2-eg1-rand-cap0-2.graphml"
    services = "inputs/services/abc-start_delay0.yaml"
    sim_config = "inputs/config/simulator/mmpp-12-8.yaml"
    training_duration = "200000"
    main([network, sim_config, services, training_duration, '-a', '-s', '8443'])
    # main([network, agent_config, sim_config, services, training_duration, '-t', '2020-12-03_13:17:26_seed9834'])

    # main([network, agent_config, sim_config, services, training_duration, '--best'])
    # main([network, agent_config, sim_config, services, training_duration, '-t', 'best',
    #       '-m', 'results/models/poisson/model.zip'])