ValueError: The model's max seq len (32768) #128

RACMUP · 2024-03-06T12:56:38Z

I have 3 x 3090 GPU with 72GB RAM running in Linux so should have enough GPU memory. Im getting this error after install.

python3 server_vllm.py --model "meetkai/functionary-small-v2.2" --host 0.0.0.0
/mnt/data/Applications/functionary/server_vllm.py:94: PydanticDeprecatedSince20: Pydantic V1 style @validator validators are deprecated. You should migrate to Pydantic V2 style @field_validator validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.6/migration/
@validator("tool_choice", always=True)
INFO 03-06 20:54:30 server_vllm.py:542] args: Namespace(host='0.0.0.0', port=8000, allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=[''], served_model_name=None, grammar_sampling=True, model='meetkai/functionary-small-v2.2', tokenizer=None, revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', max_model_len=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, block_size=16, seed=0, swap_space=4, gpu_memory_utilization=0.9, max_num_batched_tokens=None, max_num_seqs=256, max_paddings=256, disable_log_stats=False, quantization=None, enforce_eager=False, max_context_len_to_capture=8192, disable_custom_all_reduce=False, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', max_cpu_loras=None, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
INFO 03-06 20:54:33 llm_engine.py:72] Initializing an LLM engine with config: model='meetkai/functionary-small-v2.2', tokenizer='meetkai/functionary-small-v2.2', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 03-06 20:54:37 weight_utils.py:164] Using model weights format ['.safetensors']
INFO 03-06 20:55:24 llm_engine.py:322] # GPU blocks: 1699, # CPU blocks: 2048
Traceback (most recent call last):
File "/mnt/data/Applications/functionary/server_vllm.py", line 550, in
engine = AsyncLLMEngine.from_engine_args(engine_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/data/Applications/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 633, in from_engine_args
engine = cls(
^^^^
File "/mnt/data/Applications/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 350, in init
self.engine = self._init_engine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/data/Applications/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 393, in _init_engine
return engine_class(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/graham/miniconda3/envs/Fnary/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 114, in init
self._init_cache()
File "/home/graham/miniconda3/envs/Fnary/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 331, in _init_cache
raise ValueError(
ValueError: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (27184). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.

The text was updated successfully, but these errors were encountered:

jeffrey-fong · 2024-03-07T05:00:45Z

Hi, the default context window of the base model is 32k so you can set the max_model_len to 8k if you are using a GPU with 24GB VRAM. Like what the error suggests, 24GB of VRAM is not enough to load the model with KV cache of size 32K in vLLM. By setting max_model_len to 8k, it should work.

python3 server_vllm.py --model meetkai/functionary-small-v2.2 --host 0.0.0.0 --max-model-len 8192

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: The model's max seq len (32768) #128

ValueError: The model's max seq len (32768) #128

RACMUP commented Mar 6, 2024

jeffrey-fong commented Mar 7, 2024

ValueError: The model's max seq len (32768) #128

ValueError: The model's max seq len (32768) #128

Comments

RACMUP commented Mar 6, 2024

jeffrey-fong commented Mar 7, 2024