I am trying to build a complex exploration algorithm to be used in RLlib. As I add an exploration loss to the policy loss I have a code section where the sample batch contains Tensors and not arrays. In this section the Tensors cannot be evaluated (which is needed for metrics) when in static graph mode.
I already tried to wrap my function into @make_tf_callable
but that does not help as I need a feed_dict
for this, too and this is not available. Furthermore, in eager_tracing
mode there is no graph available in the policies. I guess in this case functions have to be wrapped into tf.function()
.
See for an example my PR.
Is there any way how these Tensors can get evaluated?
Feel free to use the following script to execute an example of my branch:
import os
import ray
from ray import tune
from ray.rllib.algorithms.ppo import ppo
from ray.rllib.utils.exploration.callbacks import RNDMetricsCallbacks
config = (
ppo.PPOConfig()
.environment(
env="FrozenLake-v1",
).
framework(
framework="tf",
# switch eager tracing on to see that no session is available
# in this mode.
#eager_tracing=True,
)
.training(
num_sgd_iter=8,
)
.rollouts(
num_envs_per_worker=4,
num_rollout_workers=0,
)
.debugging(
log_level="DEBUG",
seed=2,
)
.exploration(
exploration_config={
"type": "RND",
"embed_dim": 64,
"lr": 0.0001,
"intrinsic_reward_coeff": 0.005,
"nonepisodic_returns": True,
"sub_exploration": {
"type": "StochasticSampling",
},
},
)
#.callbacks(RNDMetricsCallbacks)
)
ray.init(ignore_reinit_error=True, local_mode=True)
# Trace TensorFlow, if needed.
# os.mkdir("/tmp/tf_timeline_test")
# os.environ["TF_TIMELINE_DIR"] = "/tmp/tf_timeline_test"
algorithm = config.build()
for i in range(10):
print(f"========================================{i}===========================================")
algorithm.train()
ray.shutdown()
Maybe @sven1977 or Jun Gong have an answer to this (I know you work with TensorFlow)?