TF error when restoring from checkpoint, multi-agent

RickLan · March 29, 2021, 9:50am

Hello,

My goal is to evaluate performance of trained model from checkpoints.
Using tune.run() I trained multi-agents with my own custom environment class. Checkpoint is only at the end of each run.

I’m using the same config dict from the training code in the following eval script:

ray.init()

import ray.rllib.agents.a3c as a3c
agent = a3c.A3CTrainer(config=config, env=AgentEnv)
agent.restore(args.checkpoint_path)

# instantiate env class
env = AgentEnv(env_config)

# run until episode ends
episode_reward = 0
done = False
episode_length = 10
length_count = 0
obs = env.reset()
while not done and (length_count <= episode_length):
    action = agent.compute_action(obs, policy_id="antenna_1", explore=False)
    obs, reward, done, info = env.step(action)
    length_count += 1
    print(length_count, reward)



ray.shutdown()

I am getting the following error message. I’ve shorten the array values to shorten this post.

2021-03-29 08:02:00,955 INFO trainable.py:379 -- Current state after restoring: {'_iteration': 4000, '_timesteps_total
': None, '_time_total': 13776.453395605087, '_episodes_total': 4806}
2021-03-29 08:02:01,009 ERROR tf_run_builder.py:47 -- Error fetching: [<tf.Tensor 'antenna_1/cond_1/Merge:0' shape=(?,
) dtype=int64>, {'action_prob': <tf.Tensor 'antenna_1/Exp:0' shape=(?,) dtype=float32>, 'action_logp': <tf.Tensor 'ant
enna_1/cond_2/Merge:0' shape=(?,) dtype=float32>, 'action_dist_inputs': <tf.Tensor 'antenna_1/model_1/fc_out/BiasAdd:0
' shape=(?, 5) dtype=float32>, 'vf_preds': <tf.Tensor 'antenna_1/Reshape_1:0' shape=(?,) dtype=float32>}], feed_dict={
<tf.Tensor 'antenna_1/obs:0' shape=(?, 301, 9) dtype=float32>: [array({'antenna_0': array([[0.        , 0.3125    , 0.
    0.33333334]]), 'antenna_1': array([[0.        , 0.3125    , 0.375     , ..., 0.        , 0.        ,
   [0.        , 0.66666669, 0.58333331, ..., 0.25      , 0.16666667,
    0.33333334]]), 'antenna_2': array([[0.        , 0.3125    , 0.375     , ..., 0.        , 0.        ,
    0.        ],
   [0.        , 0.66666669, 0.58333331, ..., 0.25      , 0.16666667,
    0.33333334]]), 'antenna_3': array([[0.        , 0.3125    , 0.375     , ..., 0.        , 0.        ,
    0.        ],
   [0.        , 0.66666669, 0.58333331, ..., 0.25      , 0.16666667,
    0.33333334]]), 'antenna_4': array([[0.        , 0.3125    , 0.375     , ..., 0.        , 0.        ,
    0.        ],
   [0.        , 0.66666669, 0.58333331, ..., 0.25      , 0.16666667,
    0.33333334]]), 'antenna_5': array([[0.        , 0.3125    , 0.375     , ..., 0.        , 0.        ,
    0.        ],
   [0.        , 0.66666669, 0.58333331, ..., 0.25      , 0.16666667,
    0.33333334]]), 'antenna_6': array([[0.        , 0.3125    , 0.375     , ..., 0.        , 0.        ,
    0.        ],
   [0.        , 0.66666669, 0.58333331, ..., 0.25      , 0.16666667,
    0.33333334]]), 'antenna_7': array([[0.        , 0.3125    , 0.375     , ..., 0.        , 0.        ,
    0.        ],
   [0.        , 0.66666669, 0.58333331, ..., 0.25      , 0.16666667,
    0.33333334]]), 'antenna_8': array([[0.        , 0.3125    , 0.375     , ..., 0.        , 0.        ,
    0.        ],
   [0.        , 0.66666669, 0.58333331, ..., 0.25      , 0.16666667,
    0.33333334]])}, dtype=object)], <tf.Tensor 'antenna_1/is_training:0' shape=() dtype=bool>: False, <tf.Tensor '
antenna_1/is_exploring:0' shape=() dtype=bool>: True, <tf.Tensor 'antenna_1/timestep:0' shape=() dtype=int64>: 0}
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/ray/rllib/utils/tf_run_builder.py", line 44, in get
    self.feed_dict, os.environ.get("TF_TIMELINE_DIR"))
  File "/opt/conda/lib/python3.7/site-packages/ray/rllib/utils/tf_run_builder.py", line 89, in run_timeline
    fetches = sess.run(ops, feed_dict=feed_dict)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1149, in _run
    np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
  File "/opt/conda/lib/python3.7/site-packages/numpy/core/_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)
TypeError: float() argument must be a string or a number, not 'dict'
Traceback (most recent call last):
  File "./eval_ma.py", line 125, in <module>
    action = agent.compute_action(obs, policy_id="antenna_1")
  File "/opt/conda/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 857, in compute_action
    explore=explore)
  File "/opt/conda/lib/python3.7/site-packages/ray/rllib/policy/policy.py", line 219, in compute_single_action
    timestep=timestep)
  File "/opt/conda/lib/python3.7/site-packages/ray/rllib/policy/tf_policy.py", line 340, in compute_actions
    fetched = builder.get(to_fetch)
  File "/opt/conda/lib/python3.7/site-packages/ray/rllib/utils/tf_run_builder.py", line 48, in get
    raise e
  File "/opt/conda/lib/python3.7/site-packages/ray/rllib/utils/tf_run_builder.py", line 44, in get
    self.feed_dict, os.environ.get("TF_TIMELINE_DIR"))
  File "/opt/conda/lib/python3.7/site-packages/ray/rllib/utils/tf_run_builder.py", line 89, in run_timeline
    fetches = sess.run(ops, feed_dict=feed_dict)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1149, in _run
    np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
  File "/opt/conda/lib/python3.7/site-packages/numpy/core/_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)
TypeError: float() argument must be a string or a number, not 'dict'

I am using ray v1.2.0. The policy network is the default one. I’m using A3C.

Edit: I’ve traced the code to this: ray/tf_run_builder.py at b87fc1be5505c577f01807fd342e0cdb2e129081 · ray-project/ray · GitHub

But I’m not sure how to debug further. Any help is appreciated!

eoakes · March 29, 2021, 2:22pm

@sven1977 could you take a look? Looks like we should have an action item here to provide a better error message if nothing else

RickLan · March 30, 2021, 1:24am

It turns out the agent.compute_action() expects non-dict observation, but I pass in dict version because the environment expects that. Here it the working code:

import ray.rllib.agents.a3c as a3c
agent = a3c.A3CTrainer(config=config, env=AgentEnv)
agent.restore(args.checkpoint_path)

# instantiate env class
env = AgentEnv(env_config)

# run until episode ends
done = False
episode_length = 50
length_count = 0
obs = env.reset()
while not done and (length_count <= episode_length):

  action_dict = {}
  for policy_id in config["multiagent"]["policies"].keys():
    action_dict[policy_id] = agent.compute_action(obs[policy_id], policy_id=policy_id)

  obs, reward, done_dict, info = env.step(action_dict)
  length_count += 1
  done = done_dict["__all__"]

sven1977 · March 31, 2021, 8:10am

Hey @RickLan thanks for posting this. Could you provide a self-sufficient reproduction script? Your script is missing some imports, config dicts, and environments (you can use the RandomEnv in rllib/examples/env/random_env.py if you need to mock one).

RickLan · April 1, 2021, 1:51am

import ray
import ray.rllib.agents.ppo as ppo
from ray.tune.logger import pretty_print

from ray.rllib.examples.env.random_env import RandomMultiAgentEnv



num_agents = 2
config = ppo.DEFAULT_CONFIG.copy()
config["num_workers"] = 1
config["env_config"] = {
  "num_agents" : num_agents,
}
env = RandomMultiAgentEnv(config["env_config"])
config["multiagent"] = {
  "policies" : { # (policy_cls, obs_space, act_space, config)
    "{}".format(x): (None, env.observation_space, env.action_space, {}) for x in range(num_agents)
  },
  "policy_mapping_fn": lambda x: "{}".format(x),
}

ray.init()

trainer = ppo.PPOTrainer(config=config, env=RandomMultiAgentEnv)
# Short training
for i in range(3):
  result = trainer.train()
  print(pretty_print(result))

# Evaluate trained model
done = False
count = 0
obs_dict = env.reset()
while not done:
  action_dict = {}
  for policy in config["multiagent"]["policies"].keys():
    # RandomMultiAgentEnv's keys are integers, not strings.
    action_dict[int(policy)] = trainer.compute_action(obs_dict[int(policy)], policy_id=policy)

  obs_dict, reward_dict, done_dict, info = env.step(action_dict)
  count += 1
  done = done_dict["__all__"]
  if done:
    print(count)
    print(reward_dict)
    print(done_dict)


ray.shutdown()

Cheers,

sven1977 · April 10, 2021, 1:50pm

Thanks for the script @RickLan! Let me try to reproduce now …

sven1977 · April 10, 2021, 2:07pm

Hey Rick (@RickLan ). I got the following (slightly modified) script working in the current master. Could you take a look? I’m doing the compute_action calls separately (one per agent) as this is required to specify the correct policy for each of the agents. Also, when calling env.step, I have to recompile the newly calculated action dict.

import ray
import ray.rllib.agents.a3c as a3c

from ray.rllib.examples.env.random_env import RandomMultiAgentEnv

num_agents = 2
config = a3c.DEFAULT_CONFIG.copy()
config["num_workers"] = 1
config["env_config"] = {
  "num_agents" : num_agents,
}
env = RandomMultiAgentEnv(config["env_config"])
config["multiagent"] = {
  "policies" : { # (policy_cls, obs_space, act_space, config)
    "{}".format(x): (None, env.observation_space, env.action_space, {}) for x in range(num_agents)
  },
  "policy_mapping_fn": lambda x: "{}".format(x),
}

ray.init()


import ray.rllib.agents.a3c as a3c
agent = a3c.A3CTrainer(config=config, env=RandomMultiAgentEnv)

# run until episode ends
episode_reward = 0
done = False
episode_length = 10
length_count = 0
obs = env.reset()
while not done and (length_count <= episode_length):
    # NOTE: policy_id are strings ("0" or "1"), agent_id are the respective ints (0 or 1).
    # This is due to your "policies" dict using str keys (to identify the two policies used) in the "multiagent" setup.
    action1 = agent.compute_action(obs[0], policy_id="0", explore=False)
    action2 = agent.compute_action(obs[1], policy_id="1", explore=False)
    obs, reward, done, info = env.step({0: action1, 1: action2})
    length_count += 1
    print(length_count, reward)

RickLan · April 10, 2021, 4:18pm

Hi @sven1977 ,

Thanks for getting back on this.

Looks good. As soon as I did that (TF error when restoring from checkpoint, multi-agent - #3 by RickLan), the tensorflow errors went away. Sorry if I did not make that clear.

I was using v1.2.0. Good to know it still works on the master.

Topic		Replies	Views
Multi agent checkpoints - KeyError: 'default_policy' RLlib	1	594	October 30, 2021
Restore from checkpoint gives tf not present error Checkpointing, Restoring	7	497	January 19, 2023
Get_policy error when get an action from restored trained model- New API stack	12	111	April 22, 2025
RLLib Multiagent: Load only one policy from checkpoint & Compatibility of RLLib/Tune Checkpoints RLlib	9	3293	November 24, 2021
Compute/display actions from ray.tune RLlib	10	1682	March 30, 2021

TF error when restoring from checkpoint, multi-agent

Related topics