Error using Functionary-small-v3.2 AWQ version with vLLM #259

MadanMaram · 2024-08-27T07:53:59Z

Hello Functionary team,

I'm trying to use the Functionary-small-v3.2 AWQ version with vLLM for inference, but I'm encountering an error. The vLLM library doesn't seem to recognize the 'FunctionaryForCausalLM' architecture.

Here's the specific error I'm getting:
ValueError: Model architectures ['FunctionaryForCausalLM'] are not supported for now.
I'm able to run the non-AWQ version successfully, but I'd like to use the AWQ version. Could you please provide guidance on:

Is the Functionary-small-v3.2 AWQ version compatible with vLLM?
Are there any special steps or configurations needed to use the AWQ version with vLLM?
If vLLM doesn't support this architecture, do you have any recommendations for alternative with the AWQ version of Functionary-small-v3.2?

Any information or resources you can provide would be greatly appreciated. Thank you for your help!

jeffreymeetkai · 2024-08-27T07:59:19Z

Hi, we do not have a functionary-small-v3.2 AWQ model currently. To help to reproduce, may I know where did you get this model from?

MadanMaram · 2024-08-27T08:45:58Z

Thank you for your response. I apologize for the confusion. I should have been clearer in my initial message. I don't have an official AWQ version of functionary-small-v3.2. Instead, I have quantized the model myself using the AWQ method. Here's the process I followed:
I used the AWQ library to quantize the functionary-small-v3.2 model.

Here's the code I used for quantization:
pythonCopyfrom awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
import tqdm as notebook_tqdm

model_path = 'meetkai/functionary-small-v3.2'
quant_path = 'meetkai/functionary-small-v3.2-awq'
quant_config = {
"zero_point": True,
"q_group_size": 128,
"w_bit": 4,
"version": "GEMM"
}

#Load model
model = AutoAWQForCausalLM.from_pretrained(
model_path,
**{"low_cpu_mem_usage": True},
device_map='cuda'
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

After quantizing the model using this method, I attempted to use it with vLLM, which is when I encountered the error about the 'FunctionaryForCausalLM' architecture not being supported.

I appreciate any guidance you can provide on this matter.

MadanMaram · 2024-08-27T14:21:38Z

Are there any plans to release an official AWQ version of functionary-small-v3.2 in the future?
If so, do you have an estimated timeline for when this might be available?

QwertyJack · 2024-08-28T07:51:29Z

Based on past experience, quantized versions like AWQ conserve significant RAM with minimal loss, though I'm unsure if this applies to functionary models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error using Functionary-small-v3.2 AWQ version with vLLM #259

Error using Functionary-small-v3.2 AWQ version with vLLM #259

MadanMaram commented Aug 27, 2024 •

edited

Loading

jeffreymeetkai commented Aug 27, 2024

MadanMaram commented Aug 27, 2024 •

edited

Loading

MadanMaram commented Aug 27, 2024

QwertyJack commented Aug 28, 2024

Error using Functionary-small-v3.2 AWQ version with vLLM #259

Error using Functionary-small-v3.2 AWQ version with vLLM #259

Comments

MadanMaram commented Aug 27, 2024 • edited Loading

jeffreymeetkai commented Aug 27, 2024

MadanMaram commented Aug 27, 2024 • edited Loading

MadanMaram commented Aug 27, 2024

QwertyJack commented Aug 28, 2024

MadanMaram commented Aug 27, 2024 •

edited

Loading

MadanMaram commented Aug 27, 2024 •

edited

Loading