FP8 version #224

themrzmaster · 2024-07-12T11:26:51Z

Thanks for your work!
Would be nice to have FP8 versions avilable on HF, as vLLM has special
Kernels for it and flash attention 3 is moving on that directiong too.

Thanks

khai-meetkai · 2024-07-13T02:52:27Z

Hi @themrzmaster, you mean 8-bit-AWQ right? which version are you interested in v2.5 or v3?

themrzmaster · 2024-07-17T02:43:10Z

v3! thanks

localmind-ai · 2024-08-12T15:14:35Z

@themrzmaster @khai-meetkai you can live-quantize with --quantization fp8 when launching the included vLLM script, no need for specific models. Only caveat is you still need to download the regular weights, but after that, quantization works fine. Already tested on latest medium functionary.

localmind-ai · 2024-08-13T06:40:53Z

@khai-meetkai also one more mention when doing AWQ quants (you probably know it already but I wanted to mention it just in case): it's quite important that the calibration dataset aligns with the use case of function calling, so it's probably a good idea to calibrate not just on some default dataset but also mixed with your own dataset (with some fc samples).

This makes AWQ quants (especially 4 bit) a bit more optimized and reliable. We tested this on some of the older medium functionary models and got better results by expanding the dataset we use for AWQ quantization to synthetically generated function calling data from your original model.

khai-meetkai · 2024-08-13T07:16:35Z

Hi @localmind-ai, thank you for reminding us ! Yeah, the calibration dataset should also be function calling data. Currently, we don't have any plans for creating .AWQ as we have more urgent tasks. But we will definitely use function calling data as calibration data if we do so.

localmind-ai · 2024-08-13T07:59:10Z

Thanks for the information @khai-meetkai! Fully understandable.

khai-meetkai · 2024-08-20T15:56:33Z

@localmind-ai We have just released meetkai/functionary-medium-v3.1-fp8 using small part of training data as calibration data. From our evaluation, this quantized model gave almost the same results as the original model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP8 version #224

FP8 version #224

themrzmaster commented Jul 12, 2024

khai-meetkai commented Jul 13, 2024

themrzmaster commented Jul 17, 2024

localmind-ai commented Aug 12, 2024

localmind-ai commented Aug 13, 2024

khai-meetkai commented Aug 13, 2024

localmind-ai commented Aug 13, 2024

khai-meetkai commented Aug 20, 2024

FP8 version #224

FP8 version #224

Comments

themrzmaster commented Jul 12, 2024

khai-meetkai commented Jul 13, 2024

themrzmaster commented Jul 17, 2024

localmind-ai commented Aug 12, 2024

localmind-ai commented Aug 13, 2024

khai-meetkai commented Aug 13, 2024

localmind-ai commented Aug 13, 2024

khai-meetkai commented Aug 20, 2024