Ray read text does not give whole text

prasanna_kumar · May 27, 2024, 9:51am

how to get the whole page text as one object while using ray read_text any other parameters to change or update ?

It is returning me as

{"text": "line1"}
{"text": "line2"}

but what i am expecting as

{text:"line1"+"line2"}

Sam_Chan · May 28, 2024, 6:20pm

To get the whole page text as one object while using ray.data.read_text, you can use the concat function from ray.data.ops to concatenate the rows of the resulting dataset into a single row. Here’s an example:

import ray
from ray.data.ops import concat

ds = ray.data.read_text("s3://anonymous@ray-example-data/this.txt")
concatenated_ds = ds.map(lambda row: {'text': row['text']}).concat()

result = concatenated_ds.take(1)[0]['text']
print(result)

prasanna_kumar · May 29, 2024, 12:09pm

Hey @Sam_Chan Thanks for the response . what if i pass list of files to read text how can i get to know where my doc1 starts and ends as well .
Thanks

Topic		Replies	Views
Ray.tune - Best practices for reading datasets Ray Tune	1	559	February 18, 2022
Ray.data.read_csv Huge Dataset memory limitations	0	238	September 5, 2023
Failed to read the results for 1 trials	3	495	July 26, 2023
Shared dataset on a local desktop	1	289	March 7, 2023
Trouble with some results from Ray Tune	1	42	August 7, 2024

Ray read text does not give whole text

Related topics