Tuning Hyperparameter of RLLIB PPO on Sagemaker

I have build a docker container of ppo algorithm of RLLIB. I am using amazon sagemaker to tune hyperparameters. Sagemaker requires metric_definition to optimize. What should I use as in metric_defination of sagemaker. I am not using RL estimator of sagemaker. Thanking in advance for your response.

Can you provide more info of what your setup is? How does the RLlib model hook into Sagemaker HPO? Just from my cursory understanding the metric_definition can just be set to whatever value you want to optimize, which usually would be episode_reward_mean.

Also generally RLlib plays really well with Ray Tune for HPO. Would you be able to use Tune instead of Sagemaker HPO?

I have made a docker container with the following docker file

FROM tensorflow/tensorflow:2.3.0-gpu
RUN apt-get update && apt-get install -y --no-install-recommends nginx curl

RUN pip install sagemaker-containers
RUN pip install ray[default]
RUN pip install ray[rllib]
RUN pip install torch
RUN pip install jupyterlab
RUN pip install gym
RUN pip install pandas
RUN pip install numpy
RUN pip install matplotlib
RUN pip install pickle-mixin
RUN pip install datetime
RUN pip install temp

Copies the training code inside the container

Defines train.py as script entry point

ENV PATH="/opt/ml/code:${PATH}"

COPY /ppo_amd /opt/ml/code
WORKDIR /opt/ml/code

This docker container hooks up with sagemaker and runs the program
When I try to use sagemaker hyperparameters tuner. It gives me the following error

ClientError Traceback (most recent call last)
----> 1 tuner.fit()

/usr/local/lib/python3.6/site-packages/sagemaker/tuner.py in fit(self, inputs, job_name, include_cls_metadata, estimator_kwargs, wait, **kwargs)
442 “”"
443 if self.estimator is not None:
→ 444 self._fit_with_estimator(inputs, job_name, include_cls_metadata, **kwargs)
445 else:
446 self._fit_with_estimator_dict(inputs, job_name, include_cls_metadata, estimator_kwargs)

/usr/local/lib/python3.6/site-packages/sagemaker/tuner.py in _fit_with_estimator(self, inputs, job_name, include_cls_metadata, **kwargs)
453 self._prepare_estimator_for_tuning(self.estimator, inputs, job_name, **kwargs)
454 self._prepare_for_tuning(job_name=job_name, include_cls_metadata=include_cls_metadata)
→ 455 self.latest_tuning_job = _TuningJob.start_new(self, inputs)
457 def _fit_with_estimator_dict(self, inputs, job_name, include_cls_metadata, estimator_kwargs):

/usr/local/lib/python3.6/site-packages/sagemaker/tuner.py in start_new(cls, tuner, inputs)
1507 ]
→ 1509 tuner.sagemaker_session.create_tuning_job(**tuner_args)
1510 return cls(tuner.sagemaker_session, tuner._current_job_name)

/usr/local/lib/python3.6/site-packages/sagemaker/session.py in create_tuning_job(self, job_name, tuning_config, training_config, training_config_list, warm_start_config, tags)
2027 LOGGER.info(“Creating hyperparameter tuning job with name: %s”, job_name)
2028 LOGGER.debug(“tune request: %s”, json.dumps(tune_request, indent=4))
→ 2029 self.sagemaker_client.create_hyper_parameter_tuning_job(**tune_request)
2031 def describe_tuning_job(self, job_name):

/usr/local/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
355 “%s() only accepts keyword arguments.” % py_operation_name)
356 # The “self” in this scope is referring to the BaseClient.
→ 357 return self._make_api_call(operation_name, kwargs)
359 _api_call.name = str(py_operation_name)

/usr/local/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
674 error_code = parsed_response.get(“Error”, {}).get(“Code”)
675 error_class = self.exceptions.from_code(error_code)
→ 676 raise error_class(parsed_response, operation_name)
677 else:
678 return parsed_response

ClientError: An error occurred (ValidationException) when calling the CreateHyperParameterTuningJob operation: A metric is required for this hyperparameter tuning job objective. Provide a metric in the metric definitions.

I have used metric_defination as follows

    "Name": "episode_reward_mean",
    "Regex": "episode_reward_max: ([-+]?[0-9]*[.]?[0-9]+([eE][-+]?[0-9]+)?)",

from sagemaker.tuner import HyperparameterTuner, IntegerParameter, CategoricalParameter, ContinuousParameter
hyperparameter_ranges = {

"gamma": ContinuousParameter(0.30, 0.50),
"lr": ContinuousParameter(0.0001, 0.0002),

objective_metric_name = “episode_reward_mean”
tuner = HyperparameterTuner(



This gives me the error illustrated above.

