How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
In multiple trials of my experiment, my evaluation reward / length seem to always stay at a constant value.
I am using Ray 2.0.0, and I wonder if it’s because of my evaluation configuration options? Do I have to set explore to True? But this behaviour still confuses me, since each evaluation run should be different, so it’s confusing how my policy seems to get the same answer every time?
.evaluation(
evaluation_interval = 10,
evaluation_duration = 10,
evaluation_duration_unit = 'episodes',
# evaluation_sample_timeout_s = 180.0,
evaluation_parallel_to_training = False,
evaluation_config = {
'explore': False,
'exploration_config' : {'type': 'StochasticSampling'}
},
evaluation_num_workers = 1,
# custom_evaluation_function = None
always_attach_evaluation_results = True,
# in_evaluation = False,
# sync_filters_on_rollout_workers_timeout_s = 60.0
evaluation_sample_timeout_s=3600,
)\
Please advise.