Issue with 04 templete fine-tune llms

sadra · August 26, 2023, 8:14am

I am trying to run the full param training template.04_finetuning_llms_with_deepspeed
on a single 4*A100 (80GB) machine. Already made the necessary changes to load the model locally and store it locally, However, I ran into the following issue.

./run_llama_ft.sh --size=7b [--as-test]

Failure # 1 (occurred at 2023-08-26_07-55-34)
e[36mray::_Inner.train()e[39m (pid=133909, ip=10.14.0.6, actor_id=f1fc079c10f51864ae2f3ac601000000, repr=TorchTrainer)
  File "/home/sadra/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 394, in train
    raise skipped from exception_cause(skipped)
  File "/home/sadra/lib/python3.10/site-packages/ray/train/_internal/utils.py", line 54, in check_for_failure
    ray.get(object_ref)
ray.exceptions.RayTaskError(AssertionError): e[36mray::_RayTrainWorker__execute.get_next()e[39m (pid=136753, ip=10.14.0.6, actor_id=1435c3fad5191d9a19669de301000000, repr=<ray.train._internal.worker_group.RayTrainWorker object at 0x7f06a4575030>)
  File "/home/sadra/lib/python3.10/site-packages/ray/train/_internal/worker_group.py", line 33, in __execute
    raise skipped from exception_cause(skipped)
  File "/home/sadra/lib/python3.10/site-packages/ray/train/_internal/utils.py", line 129, in discard_return_wrapper
    train_func(*args, **kwargs)
  File "/home/sadra/ray/doc/source/templates/04_finetuning_llms_with_deepspeed/finetune_hf_llm.py", line 307, in training_function
    outputs = model(**batch)
  File "/home/sadra/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sadra/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/home/sadra/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1768, in forward
    loss = self.module(*inputs, **kwargs)
  File "/home/sadra/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/sadra/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 827, in forward
    logits = self.lm_head(hidden_states)
  File "/home/sadra/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    result = hook(self, args)
  File "/home/sadra/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/home/sadra/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 383, in _pre_forward_module_hook
    self.pre_sub_module_forward_function(module)
  File "/home/sadra/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/sadra/lib/python3.10/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 495, in pre_sub_module_forward_function
    param_coordinator.fetch_sub_module(sub_module, forward=True)
  File "/home/sadra/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/home/sadra/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/sadra/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 306, in fetch_sub_module
    assert param.ds_status == ZeroParamStatus.AVAILABLE, param.ds_summary()
AssertionError: {'id': 292, 'status': 'NOT_AVAILABLE', 'numel': 0, 'ds_numel': 0, 'shape': (0,), 'ds_shape': (0, 4096), 'requires_grad': True, 'grad_shape': None, 'persist': True, 'active_sub_modules': {453}, 'ds_tensor.shape': torch.Size([0])}

this is what I see before error happens

(pid=137497) Running: 0.0/96.0 CPU, 0.0/4.0 GPU, 3.56 MiB/1.86 GiB object_store_memory:   0%| | 20/36864 [00:00<04:52, 12023-08-26 07:55:34,707 ERROR tune_controller.py:1507 -- Trial task failed for trial TorchTrainer_c16fe_00000

I tested in various scenarios from deepspeed .8 .9.3 .10 and python 3.8 as well.
torch is 2.0.1 , transformers 4.32,
tested with both Version: 3.0.0.dev0, but with 2.6 errors were similar.

cuda version of pytorch and nvcc are matching. the machine already has 900GB Ram, I also tried disabling zero_load.

Can someone give me ideas how to fix it?

matthewdeng · August 26, 2023, 4:54pm

@kourosh can you take a look?

Chongxiao_Cao · September 7, 2023, 6:58am

Same error on our side with transformers>=4.32.0 and Ray Torch Trainer.

2376    return func(*args, **kwargs)2375  File "/usr/local/lib/python3.9/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context2374    ret_val = func(*args, **kwargs)2373  File "/usr/local/lib/python3.9/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn2372    param_coordinator.fetch_sub_module(sub_module, forward=True)2371  File "/usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 494, in pre_sub_module_forward_function2370    return func(*args, **kwargs)

Chongxiao_Cao · September 7, 2023, 7:11am

It looks like related to this issue

github.com/huggingface/transformers

Assertion error when using Trainer & Deepspeed stage 3 with `model.resize_token_embeddings`

opened 07:14AM - 05 Sep 23 UTC

kai01ai

### System Info - `transformers` version: 4.33.0 - Platform: Linux-3.10.0-11…60.83.1.el7.x86_64-x86_64-with-glibc2.35 - Python version: 3.10.9 - Huggingface_hub version: 0.15.1 - Safetensors version: 0.3.2 - Accelerate version: 0.22.0 - Accelerate config: - compute_environment: LOCAL_MACHINE - distributed_type: DEEPSPEED - mixed_precision: bf16 - use_cpu: False - debug: False - num_processes: 64 - machine_rank: 0 - num_machines: 8 - main_process_ip: 127.0.0.1 - main_process_port: 29500 - rdzv_backend: static - same_network: True - main_training_function: main - deepspeed_config: {'deepspeed_multinode_launcher': 'standard', 'gradient_accumulation_steps': 1, 'gradient_clipping': 1.0, 'offload_optimizer_device': 'none', 'offload_param_device': 'none', 'zero3_init_flag': True, 'zero3_save_16bit_model': True, 'zero_stage': 3} - downcast_bf16: no - tpu_use_cluster: False - tpu_use_sudo: False - tpu_env: [] - PyTorch version (GPU?): 2.0.1+cu117 (True) - Tensorflow version (GPU?): not installed (NA) - Flax version (CPU?/GPU?/TPU?): not installed (NA) - Jax version: not installed - JaxLib version: not installed - Using GPU in script?: <fill in> - Using distributed or parallel set-up in script?: <fill in> ### Who can help? @pacman100 ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks - [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below) ### Reproduction I've encountered an issue while training using the combination of Trainer & Deepspeed stage 3. When invoking model.resize_token_embeddings, an AssertError arises during training. This was not an issue in transformers version 4.31.0. However, for versions > 4.31.0 and in the main branch, this problem persists. I suspect this might be related to PR https://github.com/huggingface/transformers/pull/25394 ``` File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/miniconda3/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 310, in fetch_sub_module return func(*args, **kwargs) File "/root/miniconda3/lib/python3.10/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 310, in fetch_sub_module assert param.ds_status == ZeroParamStatus.AVAILABLE, param.ds_summary() AssertionErrorassert param.ds_status == ZeroParamStatus.AVAILABLE, param.ds_summary() AssertionError: {'id': 292, 'status': 'NOT_AVAILABLE', 'numel': 0, 'ds_numel': 0, 'shape': (0,), 'ds_shape': (0, 4096), 'requires_grad': True, 'grad_shape': None, 'persist': True, 'active_sub_modules': {453}, 'ds_tensor.shape': torch.Size([0])} ``` code: ```python test.py from transformers import ( AutoConfig, AutoTokenizer, AutoModelForCausalLM, HfArgumentParser, TrainingArguments, DataCollatorForSeq2Seq, Trainer, ) def main(): parser = HfArgumentParser((TrainingArguments)) training_args, = parser.parse_args_into_dataclasses() model_path = '/path/to/Llama-2-7b-hf' config = AutoConfig.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained( model_path, config = config, ) tokenizer = AutoTokenizer.from_pretrained( model_path, use_fast=False, model_max_length=1024, ) add_new_tokens = True if add_new_tokens: # deepspeed AssertionError tokenizer.add_special_tokens({"pad_token": "<pad>",}) model.resize_token_embeddings(len(tokenizer)) else: # it works tokenizer.pad_token = tokenizer.eos_token from datasets import Dataset def gen(): for _ in range(100): yield {"input_ids": [1, 2, 3], "labels": [1, 1, 1]} datasets = Dataset.from_generator(gen) datasets.set_format('pt') trainer = Trainer( model=model, args=training_args, tokenizer=tokenizer, data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model, max_length=tokenizer.model_max_length), train_dataset=datasets, ) trainer.train() if __name__ == "__main__": main() ``` scripts: ```shell deepspeed test.py \ --deepspeed configs/zero3_hf.conf \ --output_dir output/test/ \ --do_train \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --evaluation_strategy "no" \ --report_to "none" \ ``` deepspeed config ```conf { "bf16": { "enabled": "auto" }, "zero_optimization": { "stage": 3, "overlap_comm": true, "contiguous_gradients": true, "sub_group_size": 1e9, "reduce_bucket_size": "auto", "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "stage3_max_live_parameters": 1e9, "stage3_max_reuse_distance": 1e9, "stage3_gather_16bit_weights_on_model_save": true }, "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "steps_per_print": 1e5, "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "wall_clock_breakdown": false } ``` ### Expected behavior no error

kourosh · September 7, 2023, 4:27pm

Hey @Chongxiao_Cao

Here is the dependency that I used that worked.

torch==2.0.0
torchvision==0.15.1
torchaudio==2.0.1
git+https://github.com/huggingface/transformers.git@d0c1aeb
deepspeed==0.10.0
fairscale==0.4.13
peft==0.5.0
datasets==2.14.4
accelerate==0.21.0
evaluate==0.4.0
bitsandbytes==0.41.1
wandb==0.15.8
pytorch-lightning==2.0.6
protobuf<3.21.0
torchmetrics==1.0.3
lm_eval==0.3.0
tiktoken==0.1.2
sentencepiece==0.1.99
urllib3<1.27

Topic		Replies	Views
Ray tune + deepspeed integration Ray Tune	1	51	February 21, 2025
Use deepspeed in aviary to deploy falcon 40B / Llama 30B Fails	3	970	July 23, 2023
Support Fine-Tune a Quantized LLM Ray Data	0	499	February 21, 2024
Starting DeepSpeed Zero_Stage 3 Engine with Ray Ray Core	1	43	April 14, 2025
Tensor parallel inference with deepspeed on ray	1	123	September 27, 2024

Issue with 04 templete fine-tune llms

Related topics