Thanks for the response @mannyv.
Based on your comment, I first tried the following code. As you can see, I got rid of the ray decorator from the Server
class and just added it to the start
method. Then wrapped a get
around the start.remote()
as recommended.
import os
import ray
class TBLogger:
def __init__(self):
self.tensorboard_loggers = {}
print("Init TBLogger.")
def init_logger(self):
import tensorflow as tf
for policy_num in range(6):
eval_name = "EVAL-DEBUG"
tensorboard_log_dir = os.path.join(os.getcwd(), "../LOGS", eval_name, "EVAL_" + str(policy_num))
self.tensorboard_loggers[policy_num] = tf.summary.create_file_writer(tensorboard_log_dir)
print(f"TB Logger for eval-{policy_num} made!")
def log(self, iter):
import tensorflow as tf
for policy_writer_num in range(6):
with self.tensorboard_loggers[policy_writer_num].as_default():
tf.summary.scalar(name="Reward", data=iter+10, step=iter)
tf.summary.scalar(name="Utilization", data=iter+20, step=iter)
tf.summary.scalar(name="A-Taping", data=iter+30, step=iter)
tf.summary.scalar(name="C-Packing", data=iter+40, step=iter)
tf.summary.scalar(name="RTF", data=iter+50, step=iter)
print(f"Logged iteration {iter} for policy-{policy_writer_num}.")
self.tensorboard_loggers[policy_writer_num].flush()
class Server:
def __init__(self):
print("Server class")
def init_logger(self):
self.logger = TBLogger()
self.logger.init_logger()
@ray.remote
def start(self):
for i in range(100):
self.logger.log(i)
if __name__ == "__main__":
ray.init()
server = Server()
server.init_logger()
ray.get(server.start.remote())
This successfully makes the tensorboard logging directories and file writers since the init_logger
method is not a ray function. However, it produces the following error:
C:\ProgramData\Anaconda3\envs\venv_Ray\python.exe C:/Users/kaiyu/Desktop/aps-ray-rllib/FullCycle/etc/tb_test.py
2021-06-24 09:05:16,452 INFO services.py:1267 -- View the Ray dashboard at http://127.0.0.1:8265
Server class
Init TBLogger.
--- I got rid of all the tensorflow related logs ---
TB Logger for eval-0 made!
TB Logger for eval-1 made!
TB Logger for eval-2 made!
TB Logger for eval-3 made!
TB Logger for eval-4 made!
TB Logger for eval-5 made!
Traceback (most recent call last):
File "C:/Users/kaiyu/Desktop/aps-ray-rllib/FullCycle/etc/tb_test.py", line 55, in <module>
ray.get(server.start.remote())
File "C:\Users\kaiyu\AppData\Roaming\Python\Python38\site-packages\ray\remote_function.py", line 104, in _remote_proxy
return self._remote(args=args, kwargs=kwargs)
File "C:\Users\kaiyu\AppData\Roaming\Python\Python38\site-packages\ray\remote_function.py", line 307, in _remote
return invocation(args, kwargs)
File "C:\Users\kaiyu\AppData\Roaming\Python\Python38\site-packages\ray\remote_function.py", line 275, in invocation
list_args = ray._private.signature.flatten_args(
File "C:\Users\kaiyu\AppData\Roaming\Python\Python38\site-packages\ray\_private\signature.py", line 116, in flatten_args
raise TypeError(str(exc)) from None
TypeError: missing a required argument: 'self'
Process finished with exit code 1
So, is it safe to think that I can’t wrap an individual method within a class with a ray decorator? Seems like only functions and entire classes can use ray decorators.
Anyways, I tried your original recommendation like the following code:
@ray.remote
class Server:
def __init__(self):
print("Server class")
def init_logger(self):
self.logger = TBLogger()
self.logger.init_logger()
def start(self):
for i in range(100):
self.logger.log(i)
if __name__ == "__main__":
ray.init()
server = Server.remote()
server.init_logger.remote()
ray.get(server.start.remote())
And it works like a charm!
However, I wanted to avoid this since I wanted this start
method to run in the background.
As you can read from my original post, I am trying to run DQN with PBT while having my custom evaluator client and logging server running in the background. This client constantly checks for any new checkpoint made and evaluates it, then sends the evaluation result to the server, which logs the data with tb.writers. And I want to do this all in one script. The reason I want to avoid using a get
around the start
method is that this will be a blocking command and won’t let other functions be executed if I do incorporate all these functionalities into one script.
Now I think about it, it may not be necessary to implement all these into one script. But if I do, am I right that using get
will be blocking and won’t let others be executed?
Here’s a pseudocode of what I’d like to implement:
@ray.remote
def train():
results = tune.run("DQN", scheduler=pbt, ...)
return results
@ray.remote()
class LoggerServer:
def init_server():
def init_logger():
# Initializes tensorboard loggers.
def start_logging():
# Runs a while loop that constantly checks for
# new evaluation results from the client.
# Logs the results whenever available.
@ray.remote()
class ReplayerClient:
def init_client():
def check_new_checkpoints():
# Checks if any new checkpoint from training
# is made.
def init_replayer(checkpoint_path):
# Initializes the "replayer" for evaluation.
# i.e., initializes the simulator for eval
# and a Ray DQNTrainer with given checkpoint.
def start_replay():
# Runs a while loop that constantly calls
# `check_new_checkpoints` and runs `init_replayer`
# whenever new checkpoint is available.
if __name__ == "__main__":
# Starts running `train` in the background.
# This runs without any issue.
train_results = train.remote()
# Inits `LoggerServer` and `ReplayerClient`.
server = LoggerServer.remote()
client = ReplayerClient.remote()
# Starts evaluation and logging process.
server.start_logging.remote()
client.start_replay.remote()
As you can see, I’m very new to coding with Ray functions, so if you can take another look at this, it’ll be very helpful.
Thanks again!