How can I set up numpy seed when doing map_batches?

Hi all,

I’m building a data processing pipeline and I’m performing a transformation that uses numpy.random.

How should I set up the numpy seed? Inside the map_batches?
If I do outside, could I have problem of racing condition?

Thanks in advance!

Hey @ssamdav, what sort of race condition do you expect to see? Each UDF is run in a separate process, so I don’t think you’d have issues if you seed NumPy in your UDF.