How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi guys, I‘m a hardware engineer tring to use RL with Ray and gymnasium for accelerating analog circuit design.
Currently, I have a custom gymnasium env which passed the gymnasium.utils.env_checker. And I tried to apply this custom env with Ray Rlib. I wrote the train script based on provided example custom_env.py(https://github.com/ray-project/ray/blob/master/rllib/examples/custom_env.py). The package list(Python 3.10.13) and script are shown below.
Package Version
---------------------------- ------------
absl-py 2.0.0
aiohttp 3.8.5
aiohttp-cors 0.7.0
aiorwlock 1.3.0
aiosignal 1.3.1
anyio 3.7.1
astunparse 1.6.3
async-timeout 4.0.3
attrs 23.1.0
blessed 1.20.0
cachetools 5.3.1
certifi 2023.7.22
charset-normalizer 3.2.0
click 8.1.7
cloudpickle 2.2.1
colorful 0.5.5
contourpy 1.1.1
cycler 0.11.0
distlib 0.3.7
dm-tree 0.1.8
exceptiongroup 1.1.3
Farama-Notifications 0.0.4
fastapi 0.103.1
filelock 3.12.4
flatbuffers 23.5.26
fonttools 4.42.1
frozenlist 1.4.0
fsspec 2023.9.1
gast 0.4.0
google-api-core 2.11.1
google-auth 2.23.0
google-auth-oauthlib 1.0.0
google-pasta 0.2.0
googleapis-common-protos 1.60.0
gpustat 1.1.1
grpcio 1.58.0
gymnasium 0.29.1
h11 0.14.0
h5py 3.9.0
idna 3.4
imageio 2.31.3
Jinja2 3.1.2
jsonschema 4.19.1
jsonschema-specifications 2023.7.1
keras 2.13.1
kiwisolver 1.4.5
lazy_loader 0.3
libclang 16.0.6
lz4 4.3.2
Markdown 3.4.4
markdown-it-py 3.0.0
MarkupSafe 2.1.2
matplotlib 3.8.0
mdurl 0.1.2
mpmath 1.2.1
msgpack 1.0.6
multidict 6.0.4
networkx 3.0
numpy 1.24.3
nvidia-ml-py 12.535.108
oauthlib 3.2.2
opencensus 0.11.3
opencensus-context 0.1.3
opt-einsum 3.3.0
packaging 23.1
pandas 2.1.1
Pillow 9.3.0
pip 23.2.1
platformdirs 3.10.0
prometheus-client 0.17.1
protobuf 4.24.3
psutil 5.9.5
py-spy 0.3.14
pyarrow 13.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pydantic 1.10.12
Pygments 2.16.1
pyparsing 3.1.1
python-dateutil 2.8.2
pytz 2023.3.post1
PyWavelets 1.4.1
PyYAML 6.0.1
ray 2.7.0
referencing 0.30.2
requests 2.31.0
requests-oauthlib 1.3.1
rich 13.6.0
rpds-py 0.10.3
rsa 4.9
scikit-image 0.21.0
scipy 1.11.2
setuptools 68.0.0
six 1.16.0
smart-open 6.4.0
sniffio 1.3.0
starlette 0.27.0
sympy 1.11.1
tensorboard 2.13.0
tensorboard-data-server 0.7.1
tensorboardX 2.6.2.2
tensorflow 2.13.0
tensorflow-estimator 2.13.0
tensorflow-io-gcs-filesystem 0.34.0
termcolor 2.3.0
tifffile 2023.9.18
torch 2.0.1+cpu
torchaudio 2.0.2+cpu
torchvision 0.15.2+cpu
typer 0.9.0
typing_extensions 4.5.0
tzdata 2023.3
urllib3 1.26.16
uvicorn 0.23.2
virtualenv 20.21.0
watchfiles 0.20.0
wcwidth 0.2.6
Werkzeug 2.3.7
wheel 0.38.4
wrapt 1.15.0
yarl 1.9.2
# Ignore import pary
tf1, tf, tfv = try_import_tf()
torch, nn = try_import_torch()
parser = argparse.ArgumentParser()
parser.add_argument(
"--run", type=str, default="PPO", help="The RLlib-registered algorithm to use."
)
parser.add_argument(
"--framework",
choices=["tf", "tf2", "torch"],
default="torch",
help="The DL framework specifier.",
)
parser.add_argument(
"--as-test",
action="store_true",
help="Whether this script should be run as a test: --stop-reward must "
"be achieved within --stop-timesteps AND --stop-iters.",
)
parser.add_argument(
"--stop-iters", type=int, default=50, help="Number of iterations to train."
)
parser.add_argument(
"--stop-timesteps", type=int, default=100000, help="Number of timesteps to train."
)
parser.add_argument(
"--stop-reward", type=float, default=0.1, help="Reward at which we stop training."
)
parser.add_argument(
"--no-tune",
action="store_true",
help="Run without Tune using a manual train loop instead. In this case,"
"use PPO without grid search and no TensorBoard.",
)
parser.add_argument(
"--local-mode",
action="store_true",
help="Init Ray in local mode for easier debugging.",
)
class OpampEnv(gym.Env):
metadata = {"render_modes": ["human"], "render_fps": 30}
def __init__(self, render_mode="human"):
# Load Essential YAML files
# Ignored
# Init variables
self.reward = []
self.curr_reward = 0
self.step_count = 0
self.working_root_dict = "/simulation"
# Define action space
variable_space, self.variable_magnitude = define_variable_space(value_range_yaml_path)
self.action_space = create_action_space(variable_space)
print(f"""Action space: {self.action_space}""")
print(f"""Variable magnitude: {self.variable_magnitude}""")
# Define observation space
self.obs_dim, self.obs_typical_vals = define_param_space(config_performance_path)
self.observation_space = create_obs_space(self.obs_dim)
# assert render_mode is None or render_mode in self.metadata["render_modes"]
self.render_mode = render_mode
plt.ion()
def reset(self, seed=None, options=None):
# We need the following line to seed self.np_random
super().reset(seed=seed)
# Randomly select the observation
reset_obs = self.np_random.uniform(low=-1.0, high=1.0, size=(self.obs_dim,))
reset_obs = reset_obs.astype(np.float32)
observation = reset_obs
info = {}
if self.render_mode == "human":
self.render()
return observation, info
def step(self, action):
self.step_count += 1
print(f"Step {self.step_count}! Action: {action}")
# Step 1: Create a new working directory
working_dir = create_working_directory(self.working_root_dict)
print(f"Create working directory: {working_dir}")
# Step 2: Translate action from index to variable values,
# Update config_value.yaml based on template and variable_dict
assigned_netlist_config_path = working_dir + '/config_value_assign.yaml'
generate_assign_config(action, self.config_value_range, self.config_value_template,
assigned_netlist_config_path)
# Step 3: Run Spectre simulations
sim_results = run_spectre_simulations(working_dir, self.config_simulation, self.unassigned_netlist)
print(f"Results: {sim_results}")
# Step 4: Replace possible None with nan
sim_results = replace_none_with_nan(sim_results)
print(f"Results(replaced): {sim_results}")
save_results(sim_results, working_dir)
# Step 5: Calculate the reward
result_path = working_dir + "/results.yaml"
self.curr_reward = calculate_reward(result_path, self.config_performance_metric)
print(f"Reward: {self.curr_reward}")
self.reward.append(self.curr_reward)
print(f"Reward list: {self.reward}")
# Step 6: Normalize the observation
observation_norm = normalize_results(sim_results, self.obs_typical_vals)
observation_norm = observation_norm.astype(np.float32)
print(f"Normalized Result: {observation_norm}")
# Step 7: Check if the episode is done
terminated = self.curr_reward == 10
if terminated:
print(f"Episode terminated! Reward: {self.curr_reward}, Step count: {self.step_count}")
self.step_count = 0
info = {}
if self.render_mode == "human":
self.render()
return observation_norm, self.curr_reward, terminated, False, info
def render(self):
render_modes = self.render_mode
if not self.reward:
print("No reward to plot!")
return None
plt.scatter(list(range(len(self.reward))), self.reward)
plt.xlabel('Iteration')
plt.ylabel('Reward')
plt.draw()
plt.pause(0.001)
return None
if __name__ == "__main__":
args = parser.parse_args()
print(f"Running with following CLI options: {args}")
ray.init(local_mode=args.local_mode)
# Can also register the env creator function explicitly with:
# register_env("corridor", lambda config: SimpleCorridor(config))
config = (
get_trainable_cls(args.run)
.get_default_config()
.environment(OpampEnv)
.framework(args.framework)
.rollouts(num_rollout_workers=10)
# Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.
.resources(num_gpus=int(os.environ.get("RLLIB_NUM_GPUS", "0")))
)
stop = {
"training_iteration": args.stop_iters,
"timesteps_total": args.stop_timesteps,
"episode_reward_mean": args.stop_reward,
}
if args.no_tune:
# manual training with train loop using PPO and fixed learning rate
if args.run != "PPO":
raise ValueError("Only support --run PPO with --no-tune.")
print("Running manual train loop without Ray Tune.")
# use fixed learning rate instead of grid search (needs tune)
config.lr = 1e-3
algo = config.build()
# run manual training loop and print results after each iteration
for _ in range(args.stop_iters):
result = algo.train()
print(pretty_print(result))
# stop training of the target train steps or reward are reached
if (
result["timesteps_total"] >= args.stop_timesteps
or result["episode_reward_mean"] >= args.stop_reward
):
break
algo.stop()
else:
# automated run with Tune and grid search and TensorBoard
print("Training automatically with Ray Tune")
tuner = tune.Tuner(
args.run,
param_space=config.to_dict(),
run_config=air.RunConfig(stop=stop),
)
results = tuner.fit()
if args.as_test:
print("Checking if learning goals were achieved")
check_learning_achieved(results, args.stop_reward)
ray.shutdown()
I run the above script via CLI python train_Opamp_Env.py --run PPO --framework torch --stop-iters 200 --stop-timesteps 100 --stop-reward -0.02 --local-mode
. And several problems occur, which is listed below:
- Trials did not complete; the related log is shown below:
[2023-10-08 00:06:43,408 E 36878 36878] core_worker.cc:1716: Pushed Error with JobID: 01000000 of type: task with message: ray::PPO.train() (pid=36878, ip=10.16.20.158, actor_id=145d89b07c4158a9a05419c101000000, repr=PPO)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 400, in train
raise skipped from exception_cause(skipped)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 397, in train
result = self.step()
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 853, in step
results, train_iter_ctx = self._run_one_training_iteration()
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2838, in _run_one_training_iteration
results = self.training_step()
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 448, in training_step
train_results = self.learner_group.update(
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/core/learner/learner_group.py", line 184, in update
self._learner.update(
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/core/learner/learner.py", line 1304, in update
) = self._update(nested_tensor_minibatch)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/core/learner/torch/torch_learner.py", line 365, in _update
return self._possibly_compiled_update(batch)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/core/learner/torch/torch_learner.py", line 123, in _uncompiled_update
loss_per_module = self.compute_loss(fwd_out=fwd_out, batch=batch)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/core/learner/learner.py", line 1024, in compute_loss
loss = self.compute_loss_for_module(
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/torch/ppo_torch_learner.py", line 87, in compute_loss_for_module
action_kl = prev_action_dist.kl(curr_action_dist)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/models/torch/torch_distributions.py", line 327, in kl
for cat, oth_cat in zip(self._cats, other.cats)
AttributeError: '<class 'ray.rllib.models.torch.torch_distributions' object has no attribute 'cats' at time: 1.69669e+09
Trial status: 1 RUNNING
Current time: 2023-10-08 00:06:43. Total running time: 7hr 53min 53s
Logical resource usage: 41.0/48 CPUs, 0/0 GPUs
���������������������������������������������������������������������������������������������������������������������
��� Trial name status ���
���������������������������������������������������������������������������������������������������������������������
��� PPO_OpampEnv_4e58f_00000 RUNNING ���
���������������������������������������������������������������������������������������������������������������������
2023-10-08 00:06:43,440 ERROR tune_controller.py:1502 -- Trial task failed for trial PPO_OpampEnv_4e58f_00000
Traceback (most recent call last):
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
result = ray.get(future)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
return fn(*args, **kwargs)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/_private/worker.py", line 2547, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ray::PPO.train() (pid=36878, ip=10.16.20.158, actor_id=145d89b07c4158a9a05419c101000000, repr=PPO)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 400, in train
raise skipped from exception_cause(skipped)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 397, in train
result = self.step()
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 853, in step
results, train_iter_ctx = self._run_one_training_iteration()
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2838, in _run_one_training_iteration
results = self.training_step()
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 448, in training_step
train_results = self.learner_group.update(
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/core/learner/learner_group.py", line 184, in update
self._learner.update(
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/core/learner/learner.py", line 1304, in update
) = self._update(nested_tensor_minibatch)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/core/learner/torch/torch_learner.py", line 365, in _update
return self._possibly_compiled_update(batch)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/core/learner/torch/torch_learner.py", line 123, in _uncompiled_update
loss_per_module = self.compute_loss(fwd_out=fwd_out, batch=batch)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/core/learner/learner.py", line 1024, in compute_loss
loss = self.compute_loss_for_module(
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/torch/ppo_torch_learner.py", line 87, in compute_loss_for_module
action_kl = prev_action_dist.kl(curr_action_dist)
File "/home/userName/anaconda3/envs/gym_env/lib/python3.10/site-packages/ray/rllib/models/torch/torch_distributions.py", line 327, in kl
for cat, oth_cat in zip(self._cats, other.cats)
AttributeError: '<class 'ray.rllib.models.torch.torch_distributions' object has no attribute 'cats'
Trial PPO_OpampEnv_4e58f_00000 errored after 0 iterations at 2023-10-08 00:06:43. Total running time: 7hr 53min 53s
Error file: /home/userName/ray_results/PPO_2023-10-07_16-12-49/PPO_OpampEnv_4e58f_00000_0_2023-10-07_16-12-50/error.txt
Trial status: 1 ERROR
Current time: 2023-10-08 00:06:43. Total running time: 7hr 53min 53s
Logical resource usage: 0/48 CPUs, 0/0 GPUs
���������������������������������������������������������������������������������������������������������������������
��� Trial name status ���
���������������������������������������������������������������������������������������������������������������������
��� PPO_OpampEnv_4e58f_00000 ERROR ���
���������������������������������������������������������������������������������������������������������������������
Number of errored trials: 1
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
��� Trial name # failures error file ���
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
��� PPO_OpampEnv_4e58f_00000 1 /home/userName/ray_results/PPO_2023-10-07_16-12-49/PPO_OpampEnv_4e58f_00000_0_2023-10-07_16-12-50/error.txt ���
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
2023-10-08 00:06:43,806 ERROR tune.py:1139 -- Trials did not complete: [PPO_OpampEnv_4e58f_00000]
- While I can see the reward update and agent update by “print”, Tensorboard cannot access any data of this train. The folder structure looks like this:
├── PPO_OpampEnv_ce6ef_00000_0_2023-10-07_16-02-06
│ ├── events.out.tfevents.1696666029.eex
│ ├── params.json
│ ├── params.pkl
│ ├── result.json
├── basic-variant-state-2023-10-07_16-02-05.json
├── experiment_state-2023-10-07_16-02-05.json
└── tuner.pkl
Since no one on our team had attempted anything similar before, I didn’t have a promising idea of how to solve these problems.
Thanks in advance:)