I am using HF Transformers to finetune a model on a custom dataset.
I already had it working natively using HF on a GPU but I cannot reproduce same results on Ray.
I pickled the train and eval datasets to make sure everything is exactly same for then native vs Ray case.
Without Ray, it works fine and accuracy is changing across epochs, however, on Ray, model is always producing exactly same accuracy (up to 10th decimal point) on 10 epochs. I verified multiple times.
I am using same hyperparameters for both trials. I am also training on one node with one GPU on Ray so distributed training should not be the issue.
I am loading pickled Torch datasets like follows and then converting them to Ray datasets.
Native HF is using the torch datasets directly while Ray HF is using Ray datasets.
I noticed that the Torch datasets that have Tensors are converted to numpy arrays when converted to Ray, but technically Ray is converting them back to Tensors before calling the model and it is not giving any errors.
Any thoughts about what could be the issue?
train_dataset = pickle.loads(TRAIN_DATA)
dev_dataset = pickle.loads(EVAL_DATA)
ray_train_ds = ray.data.from_torch(train_dataset)
ray_dev_ds = ray.data.from_torch(dev_dataset)
Native HF working code
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir="models/model_base_100_pages_10_epochs_3_classes",
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=10,
weight_decay=0.01,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=dev_dataset,
# tokenizer=tokenizer,
# data_collator=data_collator,
compute_metrics=compute_metrics,
)
trainer.train()
Ray HF Transformers version (code works without errors but accuracy is not changing):
from transformers import TrainingArguments, Trainer
use_gpu = True
def trainer_init_per_worker(train_dataset, eval_dataset, **config):
id2label = {0: "pdp", 1 :"collection", 2: "other"}
label2id = {label:id for id, label in id2label.items()}
num_labels = len(id2label)
model = MarkupLMForSequenceClassification.from_pretrained("microsoft/markuplm-base", id2label=id2label, label2id=label2id, num_labels=num_labels)
args = TrainingArguments(
output_dir="page-type-classifier-v1-test",
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=10,
weight_decay=0.01,
logging_strategy="epoch",
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=False,
no_cuda=(not use_gpu)
)
return Trainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics,
)
scaling_config = ScalingConfig(num_workers=1, use_gpu=use_gpu)
trainer = TransoformersTrainer(
trainer_init_per_worker=trainer_init_per_worker,
scaling_config=scaling_config,
run_config=RunConfig(
checkpoint_config=CheckpointConfig(
num_to_keep=1,
checkpoint_score_attribute="eval_loss",
checkpoint_score_order="min",
),
),
datasets={"train": ray_train_ds, "evaluation": ray_dev_ds},
)
result = trainer.fit()