How can I set up numpy seed when doing map_batches?

ssamdav · May 3, 2023, 3:07pm

Hi all,

I’m building a data processing pipeline and I’m performing a transformation that uses numpy.random.

How should I set up the numpy seed? Inside the map_batches?
If I do outside, could I have problem of racing condition?

Thanks in advance!

bveeramani · May 4, 2023, 8:19pm

Hey @ssamdav, what sort of race condition do you expect to see? Each UDF is run in a separate process, so I don’t think you’d have issues if you seed NumPy in your UDF.

Topic		Replies	Views
[Data] map_batches is not respecting concurrency from the beginning	1	179	December 6, 2024
How to run map_batches function in the same order as the blocks in the block_list Ray Data	9	847	April 12, 2023
Reproducibility parmeter server failed Ray Core	1	409	April 20, 2022
How can I force map_batches to work harder?	1	191	December 8, 2023
Specifying schema using from_numpy() Ray Data	1	392	March 15, 2023

How can I set up numpy seed when doing map_batches?

Related topics