Ray Data can run out of memory even with large RAM if the dataset is very large, the number of blocks is too small, or the block size is too big. By default, Ray Data tries to fit blocks into memory, but if a block is too large (e.g., due to few blocks or large files), it can cause OOM regardless of the total RAM. This is especially likely if the underlying Iceberg table is large and the parallelism (number of blocks) is not set high enough, causing each block to be massive in memory—even if you select only a few columns of short strings. This is a known issue and can be mitigated by increasing the number of output blocks using the override_num_blocks parameter in read_iceberg, which splits the data into smaller, more manageable chunks for processing and avoids OOM errors. See Ray Data performance tips and read_iceberg API docs.