Read data from hdfs with ray

murat · July 30, 2021, 8:22pm

hi ,
i want to read data from hdfs but i got error that was Unable to load libhdfs . ı used modin . What can i do ?
thank you

sangcho · July 30, 2021, 8:26pm

cc @Alex Maybe good Dataset use case?

ericl · August 3, 2021, 8:11pm

You can pass in filesystem= to the read APIs to specify a Hadoop pyarrow filesystem: pyarrow.fs.HadoopFileSystem — Apache Arrow v5.0.0

murat · August 7, 2021, 9:14pm

hi @ericl ,
should ı install ray v2.0.0.dev0 ?

ı used arroy but ı got error .

AttributeError: module ‘pickle’ has no attribute ‘PickleBuffer’

sangcho · August 9, 2021, 10:17pm

cc @Alex can you answer what’s the best action to get around his issue?

Alex · August 9, 2021, 10:54pm

Hey @murat, I don’t have an hdfs cluster handy, but what version of pyarrow do you have (pip freeze | grep pyarrow)?

mmuru · August 16, 2021, 3:14pm

pip install cloudpickle should fix pickle error.

I think, it would be nice to include requirements.txt for Ray.data module.

devin-petersohn · August 23, 2021, 4:23pm

@Alex @sangcho would it be possible to get pinged when bug in Modin pops up here in the future?

sangcho · August 25, 2021, 5:16am

Topic		Replies	Views
Reading data from hdfs meets Segmentation fault Ray Data	1	42	March 24, 2025
Does ray.data.read_json() support reading from HDFS?	4	680	July 24, 2023
Can't pickle pyarrow.dataset.Expression Ray Serve	9	1769	June 7, 2021
Ray Distributed Load from HDFS Ray Data	6	1966	May 17, 2022
[core] Ray session conflicts with PyArrow+HDFS Ray Core	0	184	March 7, 2024