How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I setup a policy client/server training architecture for a multi agent environment running on an external simulator . N different client/servers are instantiated for each agent, and all of them are on the same machine (for now). After the instantiation of the first server the system deadlocks on the .train() method at the end of its instantiation, outputting just “{}”, while by removing the call the code works fine.
def policy_Server(self):
cppu_number = self.cppu_name[-1:]
self.LOCAL_PORT = self.SERVER_BASE_PORT + int(cppu_number)*10
def _input(io_ctx):
# Create a PolicyServerInput.
if self.num_workers == 0:
self.local_address = self.LOCAL_PORT
self.server = PolicyServerInput(io_ctx, self.SERVER_ADDRESS, self.LOCAL_PORT)
self.logger.info(f'Server for {self.cppu_name} @ {self.LOCAL_PORT} initialized')
return self.server
# No InputReader (PolicyServerInput) needed.
else:
self.server = None
self.logger.info(f'Server for {self.cppu_name} not initialized')
return self.server
config_file = {
# Indicate that the Trainer we setup here doesn't need an actual env.
# Allow spaces to be determined by user (see below).
"env": None,
# retrive infos of the env
"observation_space": self.observation_space,
"action_space": self.action_space,
# Use the `PolicyServerInput` to generate experiences.
"input": (lambda io_ctx : _input(io_ctx)),
"input_evaluation": [],
# Use n worker processes to listen on different ports.
"num_workers": self.num_workers,
# Disable OPE, since the rollouts are coming from online clients.
# "off_policy_estimation_methods": {},
# Set to INFO so we'll see the server's actual address:port.
"log_level": "INFO",
# Other settings for trainer
"train_batch_size": 256,
"rollout_fragment_length": 20,
"framework": "tf",
}
self.algorithm = get_trainer_class("PPO")(config=config_file)
def policy_Client(self):
try:
self.client = PolicyClient("http://localhost:" + str(self.LOCAL_PORT), inference_mode=self.inference_mode)
self.logger.info(f'Client for {self.cppu_name} connected @ {str(self.LOCAL_PORT)}')
except:
self.logger.info(f'Client for {self.cppu_name} connection failed @ {str(self.LOCAL_PORT)}')
class Agent():
def __init__(self):
self.policy_Server()
self.Policy_Client()
while True:
# Perform one iteration of training the policy with PPO
result = self.algorithm.train()
self.logger.info(f'Training enabled')
I was wondering whether the issue stays in the lack of connection/communication or it is due to the fact that I should explicitly create a thread for each client and server before instantiating them. Looking at the cartPole and Unity examples, I noticed that clients and servers are usually lunched on two different shell instances, while on my project they are not.