Off-policy evaluation - how to control batch sample size

steff · May 17, 2023, 8:35pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

How is the default batch sample size passed to OffPolicyEstimator.estimate(…) method computed?

OffPolicyEstimator.estimate(batch: SampleBatchType, split_batch_by_episode: bool = True)

Is there a way to pass a batch sample size parameter using the off-policy evaluation API:

config = (
DQNConfig()
.environment(env=“CartPole-v1”)
.framework(“torch”)
.offline_data(input_=“/tmp/cartpole-out”)
.evaluation(
evaluation_interval=1,
evaluation_duration=10,
evaluation_num_workers=1,
evaluation_duration_unit=“episodes”,
evaluation_config={“input”: “/tmp/cartpole-eval”},
off_policy_estimation_methods={
“wis”: {“type”: WeightedImportanceSampling},
},
)
)

Thanks,
Stefan

Rohan138 · May 18, 2023, 6:10pm

So currently we estimate on the whole dataset using estimator.estimate_on_dataset and splitting it across all eval workers: ray/weighted_importance_sampling.py at master · ray-project/ray · GitHub

The effective batch size is eval_dataset.count() // evaluation_num_workers

steff · May 18, 2023, 7:19pm

When the off-policy evaluation associated with this code executes, it invokes WeightedImportanceSampling.estimate_on_single_episode(…). How can I get it to estimate using the whole dataset using the WeightedImportanceSampling.estimate_on_dataset(…) method that you mentioned?

config = (
DQNConfig()
.environment(env=“CartPole-v1”)
.framework(“torch”)
.offline_data(input_=“/tmp/cartpole-out”)
.evaluation(
evaluation_interval=1,
evaluation_duration=10,
evaluation_num_workers=1,
evaluation_duration_unit=“episodes”,
evaluation_config={“input”: “/tmp/cartpole-eval”},
off_policy_estimation_methods={
“wis”: {“type”: WeightedImportanceSampling},
},
)
)

Rohan138 · May 19, 2023, 6:29pm

Ah, my fault-currently we only call estimate_on_dataset for bandit problems i.e. when split_batch_by_episode = False.

Otherwise, we sample batches from the evaluation workers, call estimator.estimate(batch), which splits the episodes in the batch and calls estimate_on_single_episode. In this case the overall batch size is the same as your evaluation config.

So in the above example, we evaluate on a batch of 10 episodes collected from 1 rollout worker.

Rohan138 · May 19, 2023, 6:34pm

You can find the evaluation config docs here: Getting Started with RLlib — Ray 2.4.0

And the relevant code here: ray/algorithm.py at master · ray-project/ray · GitHub

Topic		Replies	Views
Offline data and off-policy estimation RLlib	4	710	July 20, 2022
Offline RL evaluation Configure Algorithm, Training, Evaluation, Scaling	1	393	April 17, 2023
Doubly Robust off-policy estimation method RLlib	6	454	August 3, 2022
Usage of MultiAgentSampleBatchBuilder RLlib	3	639	July 29, 2021
Offline data with self made dataset RLlib	1	262	June 7, 2023

Off-policy evaluation - how to control batch sample size

Related topics