Off-policy evaluation - how to control batch sample size

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

How is the default batch sample size passed to OffPolicyEstimator.estimate(…) method computed?

OffPolicyEstimator.estimate(batch: SampleBatchType, split_batch_by_episode: bool = True)

Is there a way to pass a batch sample size parameter using the off-policy evaluation API:

config = (
DQNConfig()
.environment(env=“CartPole-v1”)
.framework(“torch”)
.offline_data(input_=“/tmp/cartpole-out”)
.evaluation(
evaluation_interval=1,
evaluation_duration=10,
evaluation_num_workers=1,
evaluation_duration_unit=“episodes”,
evaluation_config={“input”: “/tmp/cartpole-eval”},
off_policy_estimation_methods={
“wis”: {“type”: WeightedImportanceSampling},
},
)
)

Thanks,
Stefan

So currently we estimate on the whole dataset using estimator.estimate_on_dataset and splitting it across all eval workers: ray/weighted_importance_sampling.py at master · ray-project/ray · GitHub

The effective batch size is eval_dataset.count() // evaluation_num_workers

When the off-policy evaluation associated with this code executes, it invokes WeightedImportanceSampling.estimate_on_single_episode(…). How can I get it to estimate using the whole dataset using the WeightedImportanceSampling.estimate_on_dataset(…) method that you mentioned?

config = (
DQNConfig()
.environment(env=“CartPole-v1”)
.framework(“torch”)
.offline_data(input_=“/tmp/cartpole-out”)
.evaluation(
evaluation_interval=1,
evaluation_duration=10,
evaluation_num_workers=1,
evaluation_duration_unit=“episodes”,
evaluation_config={“input”: “/tmp/cartpole-eval”},
off_policy_estimation_methods={
“wis”: {“type”: WeightedImportanceSampling},
},
)
)

Ah, my fault-currently we only call estimate_on_dataset for bandit problems i.e. when split_batch_by_episode = False.

Otherwise, we sample batches from the evaluation workers, call estimator.estimate(batch), which splits the episodes in the batch and calls estimate_on_single_episode. In this case the overall batch size is the same as your evaluation config.

So in the above example, we evaluate on a batch of 10 episodes collected from 1 rollout worker.

You can find the evaluation config docs here: Getting Started with RLlib — Ray 2.4.0

And the relevant code here: ray/algorithm.py at master · ray-project/ray · GitHub