[Draft] Avoid loading model weights before recipe application if any #2230

rahul-tuli · 2024-04-08T16:45:56Z

Peviously when SparseAutoModelForCausalLM.from_pretrained(...) was called the weights were loaded in twice, once during model = super(AutoModelForCausalLM, cls).from_pretrained(...) and then again after recipe application, which is undesirable.

This PR updates the flow to use from_config(...) over from_pretrained, which initializes a model with init weight data, after recipe application the actual trained weights are loaded back in.

More info on from_config: https://huggingface.co/transformers/v3.0.2/model_doc/auto.html#transformers.AutoModel.from_config

initial effort was to accomplish this with accelerate.init_empty weights but we run into https://discuss.huggingface.co/t/error-the-model-weights-are-not-tied-please-use-the-tie-weights-method-before-using-the-infer-auto-device-function-even-after-adding-model-tie-weights/46325 issue with quantized models.

Tests: Tested loading dense, sparse and quantized checkpoints which load just fine

Test script:

import time
from typing import List
from sparseml.transformers import SparseAutoModelForCausalLM
from argparse import ArgumentParser

parser = ArgumentParser()
parser.add_argument("--model-type", type=str, choices=["dense", "sparse", "quantized"], default="quantized")
parser.add_argument("--all", action="store_true")

BASE_MODEL = "Xenova/llama2.c-stories15M"

# Define the model paths for each model type
models = {
    "dense": "Xenova/llama2.c-stories15M",
    "sparse": "/home/rahul/projects/sparseml/local/local_output/sparse_model_80",
    "quantized": "mgoin/llama2.c-stories15M-quant-pt",
}

def load_and_time(model_path):
    start_time = time.time()
    SparseAutoModelForCausalLM.from_pretrained(model_path)
    end_time = time.time()
    return end_time - start_time

def load_weights(model_types: List[str]):
    return {
            model_type: load_and_time(models[model_type])
            for model_type in model_types
        }

    

def main(args):
    timings = ( 
               load_weights(model_types=list(models.keys()))
               if args.all 
               else load_weights(model_types=[args.model_type])
    )
    print(timings)

if __name__ == "__main__":
    args = parser.parse_args()
    main(args=args)

dbogunowicz · 2024-04-09T06:22:10Z

src/sparseml/pytorch/model_load/helpers.py

    """
    Takes a loaded Pytorch model and applies any structural changes such as quantization
    to the model, then reloads the model.

    :param model: PyTorch model to apply structure to
    :param recipe_path: path to recipe to apply to the model
    :param model_path: path to model, used for reloading the state dict
+    :param reload_weights: flag to reload the weights after applying the recipe.
+        Dafault is True.


Suggested change

Dafault is True.

Default is True.

dbogunowicz

Looking good!

dbogunowicz · 2024-04-09T06:25:55Z

src/sparseml/transformers/sparsification/sparse_model.py

@@ -130,12 +135,27 @@ def skip(*args, **kwargs):
            compressor.overwrite_weights(model_path=model_path, model=model)

        recipe = resolve_recipe(recipe=recipe, model_path=pretrained_model_name_or_path)
+
+        # this must be done before recipe is applied


curious, why? how does this modify the state of the model?

dbogunowicz · 2024-04-15T07:35:36Z

Also @rahul-tuli, the correct implementantion of this PR should make this part of from_pretrained method:

def skip(*args, **kwargs):
    pass
# Skip the initializer step. This accelerates the loading
# of the models, especially for the quantized models
torch.nn.init.kaiming_uniform_ = skip
torch.nn.init.uniform_ = skip
torch.nn.init.normal_ = skip

redundant!

rahul-tuli added 2 commits April 8, 2024 16:32

Avoid loading model weights before recipe application if any

6323062

Merge branch 'main' into avoid-loading-weights-before-recipe-application

f88904c

rahul-tuli requested review from Satrat, bfineran, dbogunowicz and dsikka April 8, 2024 17:01

dbogunowicz reviewed Apr 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Avoid loading model weights before recipe application if any #2230

[Draft] Avoid loading model weights before recipe application if any #2230

rahul-tuli commented Apr 8, 2024 •

edited

Loading

dbogunowicz Apr 9, 2024

dbogunowicz left a comment

dbogunowicz Apr 9, 2024

dbogunowicz commented Apr 15, 2024

[Draft] Avoid loading model weights before recipe application if any #2230

Are you sure you want to change the base?

[Draft] Avoid loading model weights before recipe application if any #2230

Conversation

rahul-tuli commented Apr 8, 2024 • edited Loading

dbogunowicz Apr 9, 2024

Choose a reason for hiding this comment

dbogunowicz left a comment

Choose a reason for hiding this comment

dbogunowicz Apr 9, 2024

Choose a reason for hiding this comment

dbogunowicz commented Apr 15, 2024

rahul-tuli commented Apr 8, 2024 •

edited

Loading