Dear @Lars_Simon_Zehnder,
I initially installed the package by following the documentation. However, it didnโt seem to work, possibly due to the version of Python I was using at the time. Now, I can successfully train my model. The first option you provided worked for me.
After carefully reviewing the configurations, I updated the cartpole1-ppo.yaml
file as shown below:
cartpole-ppo:
env: custom_cartpole_env.CustomCartPole
run: PPO
stop:
episode_reward_mean: 150
timesteps_total: 100000
config:
# Works for both torch and tf.
framework: torch
gamma: 0.99
lr: 0.0003
num_workers: 1
observation_filter: MeanStdFilter
num_sgd_iter: 6
vf_loss_coeff: 0.01
model:
fcnet_hiddens: [32]
fcnet_activation: linear
vf_share_layers: true
enable_connectors: true
The custom_cartpole_env.py
file contains the following lines:
import gymnasium as gym
class CustomCartPole(gym.Env):
def __init__(self, env_config = None):
self.env = gym.make('CartPole-v1')
self.action_space = self.env.action_space
self.observation_space = self.env.observation_space
def reset(self, *, seed=None, options=None):
return self.env.reset()
def step(self, action):
obs, reward, terminated, truncated, info = self.env.step(action)
return obs, reward, terminated, truncated, info
Training works well using the command rllib train file cartpole1-ppo.yaml
, but I encounter issues during evaluation. The error message is as follows:
EnvError: The env string you provided ('custom_cartpole_env.CustomCartPole') is:
a) Not a supported/installed environment.
b) Not a tune-registered environment creator.
c) Not a valid env class string.
...
Upon completing the training, I receive the expected message, but the above error occurs during evaluation:
2023-04-06 12:16:39,770 INFO tune.py:798 -- Total run time: 62.62 seconds (62.36 seconds for the tuning loop).
Your training finished.
Best available checkpoint for each trial:
/home/../ray_results/cartpole-ppo/PPO_custom_cartpole_env.CustomCartPole_a3565_00000_0_2023-04-06_12-15-37/checkpoint_000007
You can now evaluate your trained algorithm from any checkpoint, e.g. by running:
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ rllib evaluate /home/../ray_results/cartpole-ppo/PPO_custom_cartpole_env.CustomCartPole_a3565_00000_0_2023-04-06_12-15-37/checkpoint_000007 --algo PPO โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ