Read data from hdfs with ray

hi ,
i want to read data from hdfs but i got error that was Unable to load libhdfs . ı used modin . What can i do ?
thank you

cc @Alex Maybe good Dataset use case?

Maybe try Datasets: Distributed Arrow on Ray — Ray v2.0.0.dev0

You can pass in filesystem= to the read APIs to specify a Hadoop pyarrow filesystem: pyarrow.fs.HadoopFileSystem — Apache Arrow v5.0.0

hi @ericl ,
should ı install ray v2.0.0.dev0 ?

ı used arroy but ı got error .

AttributeError: module ‘pickle’ has no attribute ‘PickleBuffer’

cc @Alex can you answer what’s the best action to get around his issue?

Hey @murat, I don’t have an hdfs cluster handy, but what version of pyarrow do you have (pip freeze | grep pyarrow)?

pip install cloudpickle should fix pickle error.

I think, it would be nice to include requirements.txt for Ray.data module.

@Alex @sangcho would it be possible to get pinged when bug in Modin pops up here in the future?

@devin-petersohn will do!