Python shell is killed while running fine tuning models

Using Ray Cluster in Kubernetes and connecting from external Jupyter Notebook.

While running my notebook to fine tune a hugging face model, the kernel is killed in the step:

from ray.train.huggingface import HuggingFaceTrainer
from ray.air.config import ScalingConfig
from ray.data.preprocessors import Chain

trainer = HuggingFaceTrainer(
    trainer_init_per_worker=trainer_init_per_worker,
    trainer_init_config={
        "batch_size": 16,
        "epochs": 1,
    },
    scaling_config=ScalingConfig(
        num_workers=num_workers,
        use_gpu=use_gpu,
        resources_per_worker={"GPU": 1, "CPU": cpus_per_worker},
    ),
    datasets={"train": ray_datasets["train"], "evaluation": ray_datasets["validation"]},
    preprocessor=Chain(splitter, tokenizer),
)

results = trainer.fit()

trainer.fit() trains the model successfully but at the end the Kernel is killed while providing a warning :

UserWarning: Ray Client is attempting to retrieve a 5.53 GiB object over the network, which may be slow. Consider serializing the object to a file and using S3 or rsync instead

I’m unable find any Docs which can help me in solving the issue by using the serializing solution provided.

Any help would be much appreciated, Thank!

Versions:
Kubernetes Version : v1.25.6
Ray Version : 2.3.1
Python Version : 3.8

@nikhil.das What is your trainer_init_per_worker code looks like? And where are the datasets.
I assume you using HF dataset and then converting them into Ray Data, right? where is all that happening?

Also, where is the HuggingFace model created?

cc; @Yard1