how to get the whole page text as one object while using ray read_text any other parameters to change or update ?
It is returning me as
{"text": "line1"}
{"text": "line2"}
but what i am expecting as
{text:"line1"+"line2"}
how to get the whole page text as one object while using ray read_text any other parameters to change or update ?
It is returning me as
{"text": "line1"}
{"text": "line2"}
but what i am expecting as
{text:"line1"+"line2"}
To get the whole page text as one object while using ray.data.read_text, you can use the concat
function from ray.data.ops
to concatenate the rows of the resulting dataset into a single row. Here’s an example:
import ray
from ray.data.ops import concat
ds = ray.data.read_text("s3://anonymous@ray-example-data/this.txt")
concatenated_ds = ds.map(lambda row: {'text': row['text']}).concat()
result = concatenated_ds.take(1)[0]['text']
print(result)
Hey @Sam_Chan Thanks for the response . what if i pass list of files to read text how can i get to know where my doc1 starts and ends as well .
Thanks