Adding custom data in training batch while sampling data from environment

dev1dze · June 6, 2024, 7:15am

I want to add custom data in the training batch while sampling data from the environment. For example, I want to add an additional reward term for each state/action that is similar to the reward term but comes from another neural network. I want to use this reward combined with the original reward from the environment to calculate the final loss.

Any help would be appreciated. Thank you very much.

edison · June 8, 2024, 11:25am

Hey @dev1dze,

You may use RLlib to add an extra reward term for each condition or action in your custom environment by following some instructions. In light of the unique features of your issue domain, this enables you to modify the incentive structure and maybe enhance the performance of your reinforcement learning agent.

github.com/Lightning-AI/pytorch-lightning

Dataloader with custom batch sampler

opened 11:31AM - 15 Dec 20 UTC

closed 06:21AM - 26 Dec 20 UTC

chagmgang

question distributed

## ❓ Questions and Help ### Before asking: 1. Try to find answers to your q…uestions in [the Lightning Forum!](https://forums.pytorchlightning.ai/) 2. Search for similar [issues](https://github.com/PyTorchLightning/pytorch-lightning/issues). 3. Search the [docs](https://pytorch-lightning.readthedocs.io/en/latest/). #### What is your question? I try to use custom IterationBasedBatchSampler. When I try to trainer.fit(model, train_dataloader) with 'ddp' and multi gpu below error message is happened. How can I use this IterationBasedBatchSampler? Or what are the check points do I have to? ``` Traceback (most recent call last): File "lea_train_net.py", line 69, in <module> main(args) File "lea_train_net.py", line 58, in main trainer.fit(model, data_loader_train, data_loader_valid) File "/usr/local/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 470, in fit results = self.accelerator_backend.train() File "/usr/local/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 143, in train results = self.ddp_train(process_idx=self.task_idx, model=model) File "/usr/local/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 298, in ddp_train results = self.train_or_test() File "/usr/local/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 65, in train_or_test results = self.trainer.train() File "/usr/local/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 521, in train self.train_loop.run_training_epoch() File "/usr/local/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 554, in run_training_epoch for batch_idx, (batch, is_last_batch) in train_dataloader: File "/usr/local/lib/python3.7/site-packages/pytorch_lightning/profiler/profilers.py", line 83, in profile_iterable value = next(iterator) File "/usr/local/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 46, in _with_is_last last = next(it) File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__ data = self._next_data() File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data) File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/usr/local/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise raise self.exc_type(msg) KeyError: Caught KeyError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/usr/local/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py", line 45, in default_collate elem = batch[0] KeyError: 0 ``` #### Code My custom IterationBasedBatchSampler code is below ``` class IterationBasedBatchSampler(torch.utils.data.sampler.BatchSampler): """ Wraps a BatchSampler, resampling from it until a specified number of iterations have been sampled """ def __init__(self, batch_sampler, num_iterations, start_iter=0): self.batch_sampler = batch_sampler self.num_iterations = num_iterations self.start_iter = start_iter def __iter__(self): iteration = self.start_iter while iteration <= self.num_iterations: # if the underlying sampler has a set_epoch method, like # DistributedSampler, used for making each process see # a different split of the dataset, then set it if hasattr(self.batch_sampler.sampler, "set_epoch"): self.batch_sampler.sampler.set_epoch(iteration) for batch in self.batch_sampler: iteration += 1 if iteration > self.num_iterations: break yield batch def __len__(self): return self.num_iterations ``` And my train code is below. ``` trainer = pl.Trainer( gpus=2, distributed_backend='ddp', val_check_interval=1000, max_epochs=1, checkpoint_callback=checkpoint_callback, logger=logger) trainer.fit(model, fake_dataloader_train, fake_dataloader_valid) ``` #### What have you tried? #### What's your environment? - OS: [e.g. iOS, Linux, Win] - Packaging [e.g. pip, conda] - Version [e.g. 0.5.2.1]

dev1dze · June 9, 2024, 6:05pm

Hi @edison ,

Thanks for your reply. I think I should be looking for the answer in *ray/rllib/env/single_agent_env_runner.py* directory since that is the place where
env.step(action) is used and once can maybe push some more metadata inside the episode? Still not entirely sure yet…

mannyv · June 9, 2024, 10:47pm

@dev1dze,

You might want to model it similar to this:

github.com

ray-project/ray/blob/8b89a7b94bc47a7ba79d7b145eef535aeff7f1b1/rllib/utils/exploration/curiosity.py

from gymnasium.spaces import Discrete, MultiDiscrete, Space
import numpy as np
from typing import Optional, Tuple, Union

from ray.rllib.models.action_dist import ActionDistribution
from ray.rllib.models.catalog import ModelCatalog
from ray.rllib.models.modelv2 import ModelV2
from ray.rllib.models.tf.tf_action_dist import Categorical, MultiCategorical
from ray.rllib.models.torch.misc import SlimFC
from ray.rllib.models.torch.torch_action_dist import (
    TorchCategorical,
    TorchMultiCategorical,
)
from ray.rllib.models.utils import get_activation_fn
from ray.rllib.policy.sample_batch import SampleBatch
from ray.rllib.utils import NullContextManager
from ray.rllib.utils.annotations import OldAPIStack, override
from ray.rllib.utils.exploration.exploration import Exploration
from ray.rllib.utils.framework import try_import_tf, try_import_torch
from ray.rllib.utils.from_config import from_config

This file has been truncated. show original

Topic		Replies	Views
Customize trainbatch data RLlib	1	299	February 12, 2023
Sample Rule-Based Expert Demonstrations in Rllib RLlib	6	1267	January 24, 2023
How can env access to custom metrics RLlib	0	210	October 18, 2021
How to train custom models with `SampleBatch.INFOS` RLlib	7	708	February 11, 2022
[rllib] Modify multi agent env reward mid training RLlib	7	1307	May 27, 2021

Adding custom data in training batch while sampling data from environment

Related topics