ValueError: buffer source array is read-only with ds.map_batches and pandas as the batch format

Hi
I am facing problems processing the text data using ds.map_batches with pandas as the batch format. Getting ValueError: buffer source array is read-only . I have described my code below.
I am using ray dataset api to read parquet files stored in S3 using:

ds = ray.data.read_parquet(“S3//PATH”)

The schema looks like this:

schema={‘col A’: string, ‘col B’: string, ‘col C’: list<element: string>}

Load spacy model:

nlp = spacy.load(“en_core_web_lg”)

I am doing basic stuff like lowercasing the text and converting the text to spacy doc. My transformation function:

def transform_batch(batch: pd.DataFrame) -> pd.DataFrame:
        batch = batch.copy(deep=True)
        batch['lower_text'] = batch['text'].map(str.lower)
        batch['spacy_docs'] = batch['lower_text'].map(nlp)
        return batch

Finally, I do:

transformed_ds = ds.map_batches(transform_batch, batch_format=‘pandas’)

The transform_batch function above works fine as a standalone pandas function but using it with ray throws the error

ValueError: buffer source array is read-only

I understand ray uses plasma store to store objects that are immutable which doesn’t allow mutating the object in place. Ray doc and ray team member from the slack community suggested creating a copy of the object as shown in the transform_batch function. However, am facing the same error. Can someone suggest a workaround for this?