How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
I have a client server training setup for a simple game. Only one client, and the server and client are on the same machine. For about two hours, everything is chugging along smoothly with the client and server exchanging data and the trainer training the agent. My output from the server looks like this:
2022-06-27 14:05:25,047 INFO policy_server_input.py:200 -- Sending worker weights to client.
127.0.0.1 - - [27/Jun/2022 14:05:25] "POST / HTTP/1.1" 200 -
2022-06-27 14:05:35,091 INFO policy_server_input.py:200 -- Sending worker weights to client.
127.0.0.1 - - [27/Jun/2022 14:05:35] "POST / HTTP/1.1" 200 -
2022-06-27 14:05:41,121 INFO policy_server_input.py:206 -- Got sample batch of size 20 from client.
127.0.0.1 - - [27/Jun/2022 14:05:41] "POST / HTTP/1.1" 200 -
2022-06-27 14:05:45,140 INFO policy_server_input.py:200 -- Sending worker weights to client.
127.0.0.1 - - [27/Jun/2022 14:05:45] "POST / HTTP/1.1" 200 -
2022-06-27 14:05:55,182 INFO policy_server_input.py:200 -- Sending worker weights to client.
127.0.0.1 - - [27/Jun/2022 14:05:55] "POST / HTTP/1.1" 200 -
2022-06-27 14:06:01,212 INFO policy_server_input.py:206 -- Got sample batch of size 20 from client.
127.0.0.1 - - [27/Jun/2022 14:06:01] "POST / HTTP/1.1" 200 -
2022-06-27 14:06:05,231 INFO policy_server_input.py:200 -- Sending worker weights to client.
127.0.0.1 - - [27/Jun/2022 14:06:05] "POST / HTTP/1.1" 200 -
2022-06-27 14:06:15,274 INFO policy_server_input.py:200 -- Sending worker weights to client.
127.0.0.1 - - [27/Jun/2022 14:06:15] "POST / HTTP/1.1" 200 -
2022-06-27 14:06:21,306 INFO policy_server_input.py:206 -- Got sample batch of size 20 from client.
127.0.0.1 - - [27/Jun/2022 14:06:21] "POST / HTTP/1.1" 200 -
agent_timesteps_total: 7800
custom_metrics: {}
date: 2022-06-27_14-06-21
done: false
episode_len_mean: .nan
episode_media: {}
episode_reward_max: .nan
episode_reward_mean: .nan
episode_reward_min: .nan
episodes_this_iter: 0
episodes_total: 0
experiment_id: 80696cf9082541d5b48af06c816b5995
hostname: pascal6
info:
learner:
maze:
custom_metrics: {}
learner_stats:
cur_lr: 9.999999747378752e-05
entropy_coeff: 0.009999999776482582
grad_gnorm: 11.02481460571289
model: {}
policy_entropy: 307.48297119140625
policy_loss: -2.469358205795288
var_gnorm: 22.7189884185791
vf_explained_var: -0.4804614782333374
vf_loss: 0.0022206767462193966
num_agent_steps_trained: 50.0
num_agent_steps_sampled: 7800
num_agent_steps_trained: 7800
num_steps_sampled: 7800
num_steps_trained: 7800
num_steps_trained_this_iter: 60
iterations_since_restore: 130
node_ip: 192.168.128.6
num_healthy_workers: 0
off_policy_estimator: {}
perf:
cpu_util_percent: 0.016279069767441864
ram_util_percent: 2.9
pid: 29528
policy_reward_max: {}
policy_reward_mean: {}
policy_reward_min: {}
sampler_perf: {}
time_since_restore: 7787.611072063446
time_this_iter_s: 59.59892225265503
time_total_s: 7787.611072063446
timers:
learn_throughput: 17040.434
learn_time_ms: 3.521
load_throughput: 379231.826
load_time_ms: 0.158
sample_throughput: 0.997
sample_time_ms: 60182.14
timestamp: 1656363981
timesteps_since_restore: 7800
timesteps_this_iter: 60
timesteps_total: 7800
training_iteration: 130
trial_id: default
warmup_time: 2.693452835083008
and from the client:
...
2022-06-27 14:06:05,233 INFO policy_client.py:279 -- Updating rollout worker weights and global vars {'timestep': 7740}.
2022-06-27 14:06:15,270 INFO policy_client.py:269 -- Querying server for new policy weights.
2022-06-27 14:06:15,278 INFO policy_client.py:279 -- Updating rollout worker weights and global vars {'timestep': 7740}.
2022-06-27 14:06:21,302 INFO policy_client.py:304 -- Sending batch of 20 env steps (20 agent steps) to server.
2022-06-27 14:06:21,308 INFO policy_client.py:298 -- Generating new batch of experiences.
Then, out of the blue, the server dies. No warning, no errors. The process just ends. Of course the client complains about not being able to connect and crashes with an error. The way the server ends, it seems like it thinks it has finished the job, but I did not specify an end time for the server. I told it to just run forever. Here is my server script:
policies = {
'maze': (
None,
Box(low=np.array([0, 0]), high=np.array([2100, 2100])),
Box(-100, 100, (2,)),
{}
),
}
def policy_mapping_fn(agent_id):
return "maze"
server_address = "localhost"
server_port = 9900
class PolicyServerInputMA(PolicyServerInput):
'''
For multi-agetnt server use this class or alternatively set the 'idle_timeout'
argument to a large value
'''
def _put_empty_sample_batch_every_n_sec(self):
# The original method places an empty SampleBatch every `idle_timeout` seconds onto the
# sample queue to avoid hanging parallel RolloutWorkers. The problem is that in multi-agent
# environment those empty SampleBatch objects run into conflict with MultiAgentBatch
# objects placed by the client. Either use this class or set 'idle_timeout' to a large number
# in multi-agent envs.
while True:
time.sleep(self.idle_timeout)
self.samples_queue.put(MultiAgentBatch({}, 0))
def _input(ioctx):
# We are remote worker or we are local worker with num_workers=0:
# Create a PolicyServerInput.
if ioctx.worker_index > 0 or ioctx.worker.num_workers == 0:
return PolicyServerInputMA(
ioctx,
server_address,
server_port,
idle_timeout=3.0
)
# No InputReader (PolicyServerInput) needed.
else:
return None
config = {
# Use the connector server to generate experiences.
"input": _input,
"env": None,
"disable_env_checking": True,
"multiagent": {
"policies": policies,
"policy_mapping_fn": policy_mapping_fn,
# "count_steps_by": "agent_steps"
},
"num_gpus": 0,
# "framework": "tf",
"num_workers": 0,
"log_level": "INFO",
# Disable OPE, since the rollouts are coming from online clients.
"input_evaluation": [],
"train_batch_size": 50,
"rollout_fragment_length": 20,
# "batch_mode": "truncate_episodes",
}
trainer = a3c.a2c.A2CTrainer(config=config)
# Serving and training loop.
i = 0
# Since this is while true, I expect the server to run forever until I send a kill signal
while True:
results = trainer.train()
trainer.export_policy_checkpoint(
f"checkpoint{i}",
policy_id="maze"
)
print(pretty_print(results))
i += 1
Is there a hidden parameter somewhere that is causing the trainer to end early?
Thanks!