Does ray dataset support a display method similar to dataframe

839576266 · December 20, 2022, 12:19pm

hello:
I am using ray for data index statistics, but I found that ray cannot modify the name of the index column like pandas. How should I deal with this situation?

cade · December 22, 2022, 9:30pm

Moving to Ray AIR category.

chengsu · December 23, 2022, 11:45pm

Hi @839576266 - could you help provide an example code for Pandas? What you want to achieve in Ray Data?

839576266 · December 26, 2022, 6:20am

Thank you for your reply.
I want to use ray to realize the function of data indicator statistics, but when the data is saved, I find that ray Dataset cannot output a column with indicator names (such as sum and mean) like pandas

import pandas as pd
dict = {'x_0': [1, 2, 3], 'x_1': [4, 5, 6]}
df = pd.DataFrame(dict)
print(df.describe())

sjl · January 12, 2023, 6:53pm

Hi @839576266 , currently when creating a Ray Dataset from an existing Pandas DataFrame (ray.data.from_pandas), the resulting Dataset does not carry over the index column.

For the code you are writing, is it a requirement that the indicator names from the Pandas DataFrame must stay in the index? Or is it possible to move it out into a column with pd.DataFrame.reset_index()? If this is OK, then you could accomplish your desired result with something like:

>>> import pandas as pd
>>> dct = {'x_0': [1, 2, 3], 'x_1': [4, 5, 6]}
>>> df = pd.DataFrame(dct)
>>> df_summary = df.describe().reset_index()
>>> df_summary
   index  x_0  x_1
0  count  3.0  3.0
1   mean  2.0  5.0
2    std  1.0  1.0
3    min  1.0  4.0
4    25%  1.5  4.5
5    50%  2.0  5.0
6    75%  2.5  5.5
7    max  3.0  6.0
>>> import ray
>>> ds_summary = ray.data.from_pandas(df_summary)
>>> ds_summary.schema()
PandasBlockSchema(names=['index', 'x_0', 'x_1'], types=[dtype('O'), dtype('float64'), dtype('float64')])
>>> ds_summary.take_all()
[{'index': 'count', 'x_0': 3.0, 'x_1': 3.0}, {'index': 'mean', 'x_0': 2.0, 'x_1': 5.0}, {'index': 'std', 'x_0': 1.0, 'x_1': 1.0}, {'index': 'min', 'x_0': 1.0, 'x_1': 4.0}, {'index': '25%', 'x_0': 1.5, 'x_1': 4.5}, {'index': '50%', 'x_0': 2.0, 'x_1': 5.0}, {'index': '75%', 'x_0': 2.5, 'x_1': 5.5}, {'index': 'max', 'x_0': 3.0, 'x_1': 6.0}]

839576266 · January 16, 2023, 5:47am

Thanks for your reply,I try to solve my question with your method.

Topic		Replies	Views
RayDMatrix reordering dataframe columns Ray Data	1	432	November 5, 2021
[Dataset] function add_column inserts repeats of sub-column instead of whole column Ray Data	2	426	November 30, 2022
Dataset in Pandas Returns Arrow Argument When Materializing Ray Data	0	283	May 22, 2024
How to deal with labeled image datasets? Ray Data	11	662	May 31, 2023
Unable to add column to ray dataset read via parquet	1	294	November 2, 2023

Does ray dataset support a display method similar to dataframe

Related topics