Different results on GPU and CPU

ardian-selmonaj · October 13, 2023, 9:01pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I am training a multi-agent RL algorithm, very similar to this example but with my own custom environment. I recognized that only changing ‘num_gpus’ from 0 to 1 completely changes my results, i.e. training on gpu gives worse results than on cpu. I know that there might be differences in results due to hardware, but I don’t assume that these differences can be that significant. See the attached image of mean rewards (cpu left, gpu right)

I have an i9-13900H and RTX 3080 Ti Notebook. I am using Ray 2.4.0, torch 2.0 and Cuda 12.2.
I know I should update to Ray 2.7 but I didn’t have the time to adjust my code to the new API.

What may be the reasons for that?

Topic		Replies	Views
Error when trying to use gpus during RL training RLlib	4	645	July 21, 2021
Can someone recommend me a ray version combination?	1	25	February 20, 2025
Training and inference ONLY using GPUs and no CPUs RLlib	7	1848	April 12, 2021
Do Training and evaluation on GPU RLlib	0	51	June 10, 2024
Intentionally not using GPU Ray Core	3	396	February 9, 2022

Different results on GPU and CPU

Related topics