Ray unicode JSON formatting

I’m working on a project that I am using Ray’s Data library to read JSONL files from one of my directories. The issue that arises is much of my JSONL files have utf-8 encoded symbols (some being French) and Ray seems to read and write these files with different encoding.

I’m wondering if there is a way I can alter the read_json(), and write_json() for them to ensure these unicode characters are preserved, and written back out in character form.

I did not see anyways in the documentation explicitly but I’m not sure if some kwargs might help with this or some other solutions exists.

Thanks.

Solution: Ray ray.data.Dataset.write_json() uses Pandas pandas.DataFrame.to_json in underlying calls, and by default enforces ascii. Instead pass force_ascii=False, as an argument to Ray’s call to leave encodings.