How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I have a multi-agent system with 3 policies (let’s call them M
, X
, and Y
). Each two timesteps first M
acts stochastically, in the next timestep, depending on M
’s action we either select X
and Y
. I want X
(and Y
) to be trained with a specific amount of experience.
Currently if I set train_batch_size
to e.g. 4, this means that e.g. X
may be trained with at most a batch of 4. If X
acted 20% of the time then it will get 20% of the experience. However, given that the percentage/rate at which X
acts is stochastic, there is no way of having a precise batch size for X
when training it.
I should also mention (although I am not sure if this is relevant) that M
is not a trainable policy.
Moreover, given that at each time-steps only one agent acts, setting count_steps_by
to either agent_steps
or env_steps
yields the same exact count.
Finally, I put the impact as medium because you can get an expected batch size for e.g. X
by estimating how often it may act on average and scaling train_batch_size
up accordingly, but this would be an estimate that works in expectation/on average after a large number of batches, it is preferable to have a guaranteed exact batch size for X
.
Is there a way to train the agents asynchronously (e.g. each of them has their own counter and experience)? Or perhaps can I edit the way steps are counted? Perhaps I only count steps in such a way that I train when X
and Y
have each at least train_batch_size
experiences?