What is the recommended way to load in a dataset to a RL env from which the reset function would sample from? The data is rather complex to generate on the fly from the correct distribution, therefore, I was just sampling from data, the data pass in the env for each rollout worker.
I realised I somehow got a 4-5x slowdown in training when I load in data that takes more memory (400 vs 635 MiB) when using 50 workers. However, I am not even close to running out of RAM (I am using a 1TB RAM machine), so I am unsure why this is the case…
Is there maybe a nice way of sampling from data using a ray dataset or something similar, it is not that obvious to me how that would be done, since the documentation makes it disconnected from RLlib. Or is there some way of using the same memory between workers? Any help would be greatly appreciated.
If you are using “only” 50 workers and the dataset is 600MB large, every worker can have it’s own copy of the dataset. This will take time to set up.
I recommend using Ray datasets.
You can also sample with Ray datasets. Ray datasets supports chunk-based shuffling of your data.
Or if your experiment can tolerate it, maybe it’s enough to sample sequentially.
I am pretty sure the slowdown is not only a consequence of the start up time, although I should admit that my environment and set up is rather complicated, so it is possible I am missing something completely unrelated to Ray+RLlib, but I ran the same experiment (training using PPO) just with different datasets and there were these in differences in speed. The env throughput definitely is not the bottleneck.
Lets say I use a ray dataset, how would I go about feeding it into the RL Env in a way that the dataset is shared? I assume creating one and passing it in as an argument wouldnt work. What you linked looks like they are using it in the trainers directly, I see how I could do that, but that seems disconnected from Env.
There you go!
Note that this does not include movement of data at the time of creating the dataset. So you can create the dataset separately in every env and pull an element from it every time you reset (or buffer some).