Usage of MultiAgentSampleBatchBuilder

Hello Dear community,

i got a question regarding the ray.rllib.evaluation.sample_batch_builder file and I am not 100% familiar with RLib yet :slightly_smiling_face:.
I want to use expert knowledge data from another source and want to pretrain an algorithm like the parameter sharing version of the DQN or PPO with this data.
Therefore, i thought about using the Sample Batch Builder Function to use this offline data for pretraining.

Unfortunately, i couldn’t find any example for the usage of the MultiAgentSampleBatchBuilder. The problem is:
How do I define the policy map, which is mandatory, in advance?
Do I have to instantiate a policy and hand it to the function?
How do I instantiate the policy?

MultiAgentSampleBatchBuilder:
https://docs.ray.io/en/master/_modules/ray/rllib/evaluation/sample_batch_builder.html

2 Likes

Hey @Bimser , the MultiAgentSampleBatchBuilder is deprecated. You can use the SimpleListCollector instead (ray.rllib.evaluation.collectors.simple_list_collector::SimpleListCollector).

But yes, you need a fully instantiated policy to be passed inside the policy_map into the constructors (for both MultiAgentSampleBatchBuilder as well as SimpleListCollector).

You can do this:

policy_map = {
    "default_policy": DQNTFPolicy(obs-space, act-space, config)
}
1 Like

Hey @sven1977, thanks for the fast response!

I tried to use the SimpleListCollector the following way:

    policies = {"shared_policy": DQNTorchPolicy(obs_space, act_space, {})}
    list_collector = SimpleListCollector(policy_map=policies,
                                                clip_rewards=False,
                                                callbacks=MyCallback())
    episode = MultiAgentEpisode(policies=policies,
                                        policy_mapping_fn=(lambda agent_id: "shared_policy"),
                                        batch_builder_factory=lambda: None,
                                        extra_batch_callback=lambda x: None,
                                        env_id=0)
    writer = JsonWriter(sample_batch_name)
    for agent_id in range(number):
        list_collector.add_init_obs(episode, agent_id, 0, 'shared_policy', -1, obs[agent_id])
    obs, reward, done, info = env.step(action_dict)
    for agent_index in range(number):
        list_collector.add_action_reward_next_obs(
            episode_id=episode.episode_id,
            agent_id=agent_index,
            env_id=0,
            policy_id="shared_policy",
            agent_done=done[agent_index],
            values=dict(
                t=step_counter,
                actions=action_dict[agent_index],
                rewards=reward[agent_index],
                dones=done[agent_index],
                infos={},
                new_obs=obs[agent_index]
            )
        )
        
    batch = list_collector.postprocess_episode(episode=episode, build=True)
    episode.add_extra_batch(batch)
    list_collector.episode_step(episode.episode_id)
    writer.write(batch)

Unfortunately, this created a json file, with only three actions, rewards, and observations. I want to create a file, which contains the data for the whole episode, which is extracted by the .step() from the multi agent environment.

I hope you can help me with that concern.

Hi, @Bimser any update at your end? I have the same question and I am appreciated if you could share some insights, thanks!