The “trajectory_view_api” does not support the DQN algorithm, and the program will run in error

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

When I use the default “trajectory_view_api” functionality, the file path is

“C:\ProgramData\Anaconda3\Lib\site-packages\ray\rllib\examples\trajectory_view_api.py”

the default algorithm used is PPO, and the program can run normally.
However, when I change the default algorithm to DQN, the program cannot run and the following error occurs(Please refer to the reply on the first floor).
I have changed the:

“–run”, type=str, default=“PPO”, help=“The RLlib-registered algorithm to use.”

to:

“–run”, type=str, default=“DQN”, help=“The RLlib-registered algorithm to use.”

and, the following two parameters of PPO algorithm are deleted:

#“num_sgd_iter”: 5,
#“vf_loss_coeff”: 0.0001,

I don’t understand the cause of the error, is that the DQN algorithm is off-policy and does not support the trajectory_view_api? Or is it something else? If you know the specific reason, I would really appreciate your answer.

Here is the error:

2022-08-05 10:51:56,358 ERROR ray_trial_executor.py:102 – An exception occurred when trying to stop the Ray actor:Traceback (most recent call last):
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\tune\ray_trial_executor.py”, line 93, in post_stop_cleanup
ray.get(future, timeout=0)
File “C:\ProgramData\Anaconda3\lib\site-packages\ray_private\client_mode_hook.py”, line 105, in wrapper
return func(*args, **kwargs)
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\worker.py”, line 1811, in get
raise value
File “python\ray_raylet.pyx”, line 797, in ray._raylet.task_execution_handler
File “python\ray_raylet.pyx”, line 616, in ray._raylet.execute_task
File “python\ray_raylet.pyx”, line 760, in ray._raylet.execute_task
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::DQNTrainer.init() (pid=10264, ip=127.0.0.1, repr=DQNTrainer)
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\util\tracing\tracing_helper.py”, line 462, in _resume_span
return method(self, *_args, **_kwargs)
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\agents\trainer.py”, line 1035, in _init
raise NotImplementedError
NotImplementedError

During handling of the above exception, another exception occurred:

ray::DQNTrainer.init() (pid=10264, ip=127.0.0.1, repr=DQNTrainer)
File “python\ray_raylet.pyx”, line 656, in ray._raylet.execute_task
File “python\ray_raylet.pyx”, line 697, in ray._raylet.execute_task
File “python\ray_raylet.pyx”, line 663, in ray._raylet.execute_task
File “python\ray_raylet.pyx”, line 667, in ray._raylet.execute_task
File “python\ray_raylet.pyx”, line 614, in ray._raylet.execute_task.function_executor
File “C:\ProgramData\Anaconda3\lib\site-packages\ray_private\function_manager.py”, line 701, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\util\tracing\tracing_helper.py”, line 462, in _resume_span
return method(self, *_args, **_kwargs)
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\agents\trainer.py”, line 830, in init
super().init(
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\tune\trainable.py”, line 149, in init
self.setup(copy.deepcopy(self.config))
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\util\tracing\tracing_helper.py”, line 462, in _resume_span
return method(self, *_args, **_kwargs)
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\agents\trainer.py”, line 911, in setup
self.workers = WorkerSet(
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\evaluation\worker_set.py”, line 162, in init
self._local_worker = self._make_worker(
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\evaluation\worker_set.py”, line 567, in _make_worker
worker = cls(
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\evaluation\rollout_worker.py”, line 626, in init
self._build_policy_map(
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\evaluation\rollout_worker.py”, line 1722, in build_policy_map
self.policy_map.create_policy(
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\policy\policy_map.py”, line 140, in create_policy
self[policy_id] = class
(
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\policy\tf_policy_template.py”, line 256, in init
DynamicTFPolicy.init(
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\policy\dynamic_tf_policy.py”, line 439, in init
self._initialize_loss_from_dummy_batch(auto_remove_unneeded_view_reqs=True)
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\policy\dynamic_tf_policy.py”, line 758, in _initialize_loss_from_dummy_batch
losses = self._do_loss_init(train_batch)
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\policy\dynamic_tf_policy.py”, line 867, in _do_loss_init
losses = self._loss_fn(self, self.model, self.dist_class, train_batch)
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\agents\dqn\dqn_tf_policy.py”, line 251, in build_q_losses
q_t, q_logits_t, q_dist_t, _ = compute_q_values(
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\agents\dqn\dqn_tf_policy.py”, line 390, in compute_q_values
model_out, state = model(input_batch, state_batches or , seq_lens)
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\models\modelv2.py”, line 251, in call
res = self.forward(restored, state or , seq_lens)
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\examples\models\trajectory_view_utilizing_models.py”, line 63, in forward
obs = tf.cast(input_dict[“prev_n_obs”], tf.float32)
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\policy\sample_batch.py”, line 744, in getitem
value = dict.getitem(self, key)
KeyError: ‘prev_n_obs’

(DQNTrainer pid=10264) 2022-08-05 10:51:56,333 ERROR worker.py:449 – Exception raised in creation task: The actor died because of an error raised in its creation task, ray::DQNTrainer.init() (pid=10264, ip=127.0.0.1, repr=DQNTrainer)
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\util\tracing\tracing_helper.py”, line 462, in _resume_span
(DQNTrainer pid=10264) return method(self, *_args, **_kwargs)
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\agents\trainer.py”, line 1035, in _init
(DQNTrainer pid=10264) raise NotImplementedError
(DQNTrainer pid=10264) NotImplementedError
(DQNTrainer pid=10264)
(DQNTrainer pid=10264) During handling of the above exception, another exception occurred:
(DQNTrainer pid=10264)
(DQNTrainer pid=10264) ray::DQNTrainer.init() (pid=10264, ip=127.0.0.1, repr=DQNTrainer)
(DQNTrainer pid=10264) File “python\ray_raylet.pyx”, line 656, in ray._raylet.execute_task
(DQNTrainer pid=10264) File “python\ray_raylet.pyx”, line 697, in ray._raylet.execute_task
(DQNTrainer pid=10264) File “python\ray_raylet.pyx”, line 663, in ray._raylet.execute_task
(DQNTrainer pid=10264) File “python\ray_raylet.pyx”, line 667, in ray._raylet.execute_task
(DQNTrainer pid=10264) File “python\ray_raylet.pyx”, line 614, in ray._raylet.execute_task.function_executor
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray_private\function_manager.py”, line 701, in actor_method_executor
(DQNTrainer pid=10264) return method(__ray_actor, *args, **kwargs)
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\util\tracing\tracing_helper.py”, line 462, in _resume_span
(DQNTrainer pid=10264) return method(self, *_args, **_kwargs)
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\agents\trainer.py”, line 830, in init
(DQNTrainer pid=10264) super().init(
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\tune\trainable.py”, line 149, in init
(DQNTrainer pid=10264) self.setup(copy.deepcopy(self.config))
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\util\tracing\tracing_helper.py”, line 462, in _resume_span
(DQNTrainer pid=10264) return method(self, *_args, **_kwargs)
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\agents\trainer.py”, line 911, in setup
(DQNTrainer pid=10264) self.workers = WorkerSet(
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\evaluation\worker_set.py”, line 162, in init
(DQNTrainer pid=10264) self._local_worker = self._make_worker(
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\evaluation\worker_set.py”, line 567, in _make_worker
(DQNTrainer pid=10264) worker = cls(
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\evaluation\rollout_worker.py”, line 626, in init
(DQNTrainer pid=10264) self._build_policy_map(
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\evaluation\rollout_worker.py”, line 1722, in build_policy_map
(DQNTrainer pid=10264) self.policy_map.create_policy(
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\policy\policy_map.py”, line 140, in create_policy
(DQNTrainer pid=10264) self[policy_id] = class
(
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\policy\tf_policy_template.py”, line 256, in init
(DQNTrainer pid=10264) DynamicTFPolicy.init(
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\policy\dynamic_tf_policy.py”, line 439, in init
(DQNTrainer pid=10264) self._initialize_loss_from_dummy_batch(auto_remove_unneeded_view_reqs=True)
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\policy\dynamic_tf_policy.py”, line 758, in _initialize_loss_from_dummy_batch
(DQNTrainer pid=10264) losses = self._do_loss_init(train_batch)
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\policy\dynamic_tf_policy.py”, line 867, in _do_loss_init
(DQNTrainer pid=10264) losses = self._loss_fn(self, self.model, self.dist_class, train_batch)
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\agents\dqn\dqn_tf_policy.py”, line 251, in build_q_losses
(DQNTrainer pid=10264) q_t, q_logits_t, q_dist_t, _ = compute_q_values(
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\agents\dqn\dqn_tf_policy.py”, line 390, in compute_q_values
(DQNTrainer pid=10264) model_out, state = model(input_batch, state_batches or , seq_lens)
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\models\modelv2.py”, line 251, in call
(DQNTrainer pid=10264) res = self.forward(restored, state or , seq_lens)
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\examples\models\trajectory_view_utilizing_models.py”, line 63, in forward
(DQNTrainer pid=10264) obs = tf.cast(input_dict[“prev_n_obs”], tf.float32)
(DQNTrainer pid=10264) File “C:\ProgramData\Anaconda3\lib\site-packages\ray\rllib\policy\sample_batch.py”, line 744, in getitem
(DQNTrainer pid=10264) value = dict.getitem(self, key)
(DQNTrainer pid=10264) KeyError: ‘prev_n_obs’
Traceback (most recent call last):
File “C:\ProgramData\Anaconda3\Lib\site-packages\ray\rllib\examples\trajectory_view_api.py”, line 85, in
results = tune.run(
File “C:\ProgramData\Anaconda3\lib\site-packages\ray\tune\tune.py”, line 695, in run
raise TuneError(“Trials did not complete”, incomplete_trials)
ray.tune.error.TuneError: (‘Trials did not complete’, [DQN_StatelessCartPole_8dd05_00000])

Here is the code of the “trajectory_view_api.py”:

import argparse
import numpy as np

import ray
from ray.rllib.agents.ppo import PPOTrainer
from ray.rllib.examples.env.stateless_cartpole import StatelessCartPole
from ray.rllib.examples.models.trajectory_view_utilizing_models import (
FrameStackingCartPoleModel,
TorchFrameStackingCartPoleModel,
)
from ray.rllib.models.catalog import ModelCatalog
from ray.rllib.utils.framework import try_import_tf
from ray.rllib.utils.test_utils import check_learning_achieved
from ray import tune

tf1, tf, tfv = try_import_tf()

parser = argparse.ArgumentParser()
parser.add_argument(
“–run”, type=str, default=“DQN”, help=“The RLlib-registered algorithm to use.”
)
parser.add_argument(
“–framework”,
choices=[“tf”, “tf2”, “tfe”, “torch”],
default=“tf”,
help=“The DL framework specifier.”,
)
parser.add_argument(
“–as-test”,
action=“store_true”,
help="Whether this script should be run as a test: --stop-reward must "
“be achieved within --stop-timesteps AND --stop-iters.”,
)
parser.add_argument(
“–stop-iters”, type=int, default=50, help=“Number of iterations to train.”
)
parser.add_argument(
“–stop-timesteps”, type=int, default=200000, help=“Number of timesteps to train.”
)
parser.add_argument(
“–stop-reward”, type=float, default=150.0, help=“Reward at which we stop training.”
)

if name == “main”:
args = parser.parse_args()
ray.init(num_cpus=3)

num_frames = 16

ModelCatalog.register_custom_model(
    "frame_stack_model",
    FrameStackingCartPoleModel
    if args.framework != "torch"
    else TorchFrameStackingCartPoleModel,
)

config = {
    "env": StatelessCartPole,
    "model": {
        "vf_share_layers": True,
        "custom_model": "frame_stack_model",
        "custom_model_config": {
            "num_frames": num_frames,
        },
        # To compare against a simple LSTM:
        # "use_lstm": True,
        # "lstm_use_prev_action": True,
        # "lstm_use_prev_reward": True,
        # To compare against a simple attention net:
        # "use_attention": True,
        # "attention_use_n_prev_actions": 1,
        # "attention_use_n_prev_rewards": 1,
    },
    #"num_sgd_iter": 5,
    #"vf_loss_coeff": 0.0001,
    "framework": args.framework,
}

stop = {
    "training_iteration": args.stop_iters,
    "timesteps_total": args.stop_timesteps,
    "episode_reward_mean": args.stop_reward,
}
results = tune.run(
    args.run, config=config, stop=stop, verbose=2, checkpoint_at_end=True
)

if args.as_test:
    check_learning_achieved(results, args.stop_reward)

checkpoints = results.get_trial_checkpoints_paths(
    trial=results.get_best_trial("episode_reward_mean", mode="max"),
    metric="episode_reward_mean",
)

checkpoint_path = checkpoints[0][0]
trainer = PPOTrainer(config)
trainer.restore(checkpoint_path)

# Inference loop.
env = StatelessCartPole()

# Run manual inference loop for n episodes.
for _ in range(10):
    episode_reward = 0.0
    reward = 0.0
    action = 0
    done = False
    obs = env.reset()
    while not done:
        # Create a dummy action using the same observation n times,
        # as well as dummy prev-n-actions and prev-n-rewards.
        action, state, logits = trainer.compute_single_action(
            input_dict={
                "obs": obs,
                "prev_n_obs": np.stack([obs for _ in range(num_frames)]),
                "prev_n_actions": np.stack([0 for _ in range(num_frames)]),
                "prev_n_rewards": np.stack([1.0 for _ in range(num_frames)]),
            },
            full_fetch=True,
        )
        obs, reward, done, info = env.step(action)
        episode_reward += reward

    print(f"Episode reward={episode_reward}")

ray.shutdown()

@ sven1977@ mannyv@ arturn
I’m sorry to bother you. I’m really anxious to solve this problem. Could you please take some time to look at this problem?