Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependencies are wrong #19

Open
MrGranddy opened this issue Jul 18, 2023 · 3 comments
Open

Dependencies are wrong #19

MrGranddy opened this issue Jul 18, 2023 · 3 comments

Comments

@MrGranddy
Copy link

Hello, I have tried lots of different version combinations to make the LLaMA script work, it produces very bad results which is
also what I observed with my own implementation and some other implementation for SparseGPT LLaMA.

All 3 of these implementations produce exactly the same results, which is good it shows probably we are doing everything correctly,
but then the performance is incredibly poor for LLaMA, it performs even worse than BLOOM or OPT.

If your results are better can you please share the exact dependencies to repeat your experiments, because the transformers
library version you give in the README does not even have LLaMA tokenizer etc.

Thank you

@efrantar
Copy link
Member

Hi, what do you mean by "very bad results"? As also discussed in #7, pruning LLaMa seems to be more challenging than pruning e.g. OPT, possibly because it is more parameter efficient. I just ran --sparsity .5 on the 7B model with pretty recent package versions (transformers==4.31.0, datasets==2.13.1 and torch==2.0.1) and got 7.20 PPL for Wiki and 9.29 for C4 PPL (some package version newer than the ones we list in the README seems to have broken PTB numbers in general, not sure why). What numbers do you get?

@MrGranddy
Copy link
Author

Hello, I make evaluations on some standart LLM evaluation tasks, using "LLM Evaluation Harness":
https://github.com/EleutherAI/lm-evaluation-harness

I get the following results for LLaMA:

LLaMA-7B Dense Magnitude 50% SparseGPT 50% SparseGPT 2:4
arc_challenge (acc_norm) 0.4138 0.302 0.2833 0.291
arc_easy (acc_norm) 0.5248 0.2702 0.2588 0.266
boolq (acc) 0.7315 0.6214 0.6193 0.3823

Normally I would expect some performance drop yet for comparison here are the results for BLOOM-7B:

BLOOM-7B1 Dense Magnitude 50% SparseGPT 50% SparseGPT 2:4
arc_challenge (acc_norm) 0.3336 0.3072 0.3055 0.2722
arc_easy (acc_norm) 0.5728 0.5261 0.5316 0.4945
boolq (acc) 0.6291 0.6064 0.6303 0.6226

So probably there is something wrong with the implementation, as I mentioned my own implementation also get the same results, so I would like to compare it with your results. Can you please do the experiments with the latest version of transformers so we can validate?

@MrGranddy
Copy link
Author

Sorry I've closed the issue by accident, I would be glad if you can re-open so we can solve the issue. I also tried the experiment with multiple torch, python and transformers versions, if your results are better, I would expect that it works for a very spesific version of libraries for some reason.

@MrGranddy MrGranddy reopened this Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants