Ray tune not logging episode metrics with SampleBatch input

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I am having an issue using ray tune with the Ray Client/Server setup. I do not have a simulator environment. I am using a live realworld system to gather SampleBatches and then intend to use that for model training. My issue is that ray tune does not seem to be logging episode metrics as part of its training. So I cannot determine which trial/checkpoint is the best one to use for deployment.

This issue seems to be related to something on the github board.

    'env': None,
    'observation_space': gym.spaces.Box(0, 1, shape=(5,)),
    'action_space': gym.spaces.Discrete(6),
analysis = tune.run(DQNTrainer,
                    config={"framework": "torch",
                            "num_workers": 1,
                            "num_gpus": 0,
                            'batch_mode': 'complete_episodes',
                            "input": sample_batch_path,
                    stop={"training_iteration": 1},
best_trial = analysis.get_best_trial(metric='total_loss', mode='min', scope='all')
best = analysis.get_best_checkpoint(best_trial, metric='total_loss', mode='min')
return best

I receive nans for all the metrics

The trainer also says episode_total = 0. Which is not right. My SampleBatches have full episodes with Done=True on the last step.

Is there a way I can get ray tune working with samplebatch style input?

@Yard1 @kai Is this something the tune experts could help?

I believe the issue is in RLLib sample batch handling. Tune just forwards whatever metrics it receives, so it’s likely that RLLib doesn’t provide the correct metrics for this input.

cc @arturn as RLLib on-call


Hey @Jason_Weinberg ,

After this gets merged, try checking out master and let me know if it works for you or if you are still missing anything.

1 Like

Awesome, I will give it a shot! Thank you for jumping in on this so fast.

1 Like

Has this update made it to the nightly update?
I tried the below but am still getting the nan’s in the output

pip install -U ray
ray install-nightly

I tried this and works for me!

1 Like

@arturn I seem to be running into the same issue of rllib not logging metrics back to tune when rllib does not have access to an environment and instead relies purely on sample batch inputs.
I am using version ray-1.13.0

Has there been some sort of regression in the code on this?

My environment is specified in the config directly

‘env’: None,
‘observation_space’: gym.spaces.Box(-np.inf, np.inf, shape=(STATE_DIM,)),
‘action_space’: gym.spaces.Discrete(ACTION_DIM),

analysis = tune.run(DQNTrainer,
                    config={"framework": "torch",
                            "num_workers": 1,
                            "num_gpus": 0 if DEBUG else 1,
                            'batch_mode': 'complete_episodes',
                            "input": os.path.join(os.getcwd(), sample_batch_path),
                    stop={"training_iteration": 10},


@sven1977 @arturn I think I am seeing the issue now. I am training locally with ray tune and DQNTrainer. It looks like the fix was integrated directly within the policy server client modules. Is there a way to include this directly into the dqn trainer so that it sends metrics to tune?

Hi. I’m running a client-server configuration with tune and a DQN Trainer in Ray 1.13.
To fix the problem of logging metrics I downloaded and replaced the files of this previous comment:

That works for me.

@hermmanhender Ok thats good to hear. Where are you integrating the ray.tune train call in your client/server config? I was trying to just train the model separate from my client/server setup.

The way I am setup is that I gather samplebatches of experiences each day. Then I process those batches and train the model in its own separate cluster away from the server/client setup each night. Then redeploy using a checkpoint each night into the server. Is that not the best way to do it?

When I say I deployed the model using client/server method I meant this. I am using ray server and I call the model via an endpoint. I train the model batch style on a dedicated ray training cluster. There is no training on that deployment server.

  class ServeAgentModel:
  def __init__(self, checkpoint_path) -> None:
      self.trainer = DQNTrainer(
              "framework": "torch",
              "num_workers": 0,
              "num_gpus": 0,

async def __call__(self, request: Request):
    json_input = await request.json()
    obs = json_input["observation"]

    action = self.trainer.compute_single_action(obs, full_fetch=True)
    return {"action": int(action[0]),
            "action_prob": float(action[2]['action_prob']),
            "action_logp": float(action[2]['action_logp'])}

@hermmanhender how are you using the policyclient object along with Tune? I don’t see any pattern that allows that in the documentation. Are you doing realtime or batch training?

@arturn do you have any insight?