Skip to content

Commit

Permalink
update readme about archival (#406)
Browse files Browse the repository at this point in the history
SUMMARY:
* update README with information about archiving this REPO.

---------

Co-authored-by: andy-neuma <[email protected]>
  • Loading branch information
andy-neuma and andy-neuma committed Sep 4, 2024
1 parent 9daca33 commit 04da663
Showing 1 changed file with 4 additions and 35 deletions.
39 changes: 4 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,12 @@
# nm-vllm

## Overview
__THIS REPO HAS BEEN ARCHIVED AS OF SEPTEMBER 2024. NEURAL MAGIC IS STILL RELEASING ENTERPRISE PACKAGES RELATED TO VLLM. OUR RELEASE REPO HAS JUST GONE PRIVATE.__

`nm-vllm` is our supported enterprise distribution of [vLLM](https://github.com/vllm-project/vllm).
To learn more about nm-vllm Enterprise, visit the [nm-vllm product page](https://neuralmagic.com/nm-vllm/).

## Installation
To contribute and to see our contributions to vLLM, visit [vLLM](https://github.com/vllm-project/vllm).

### PyPI
The [nm-vllm PyPi package](https://pypi.neuralmagic.com/simple/nm-vllm/index.html) includes pre-compiled binaries for CUDA (version 12.1) kernels. For other PyTorch or CUDA versions, please compile the package from source.

Install it using pip:
```bash
pip install nm-vllm --extra-index-url https://pypi.neuralmagic.com/simple
```

To utilize the weight sparsity features, include the optional `sparse` dependencies.
```bash
pip install nm-vllm[sparse] --extra-index-url https://pypi.neuralmagic.com/simple
```

You can also build and install `nm-vllm` from source (this will take ~10 minutes):
```bash
git clone https://github.com/neuralmagic/nm-vllm.git
cd nm-vllm
pip install -e .[sparse] --extra-index-url https://pypi.neuralmagic.com/simple
```

### Docker

The [`nm-vllm` container registry](https://github.com/neuralmagic/nm-vllm/pkgs/container/nm-vllm-openai) includes premade docker images.

Launch the OpenAI-compatible server with:

```bash
MODEL_ID=Qwen/Qwen2-0.5B-Instruct
docker run --gpus all --shm-size 2g ghcr.io/neuralmagic/nm-vllm-openai:latest --model $MODEL_ID
```

## Models
To view the latest releases, benchmarking, models, and evaluations from Neural Magic, visit [nm-vllm-certs](https://github.com/neuralmagic/nm-vllm-certs).

Neural Magic maintains a variety of optimized models on our Hugging Face organization profiles:
- [neuralmagic](https://huggingface.co/neuralmagic)
Expand Down

0 comments on commit 04da663

Please sign in to comment.