Skip to content
@bentoml

BentoML

The easiest way to build fast and reliable AI serving systems

Welcome to BentoML 👋 Twitter Follow Slack

BentoML

What's cooking? 👩‍🍳

🍱 BentoML: The Unified Serving Framework for AI Systems

BentoML is a Python library for building online serving systems optimized for AI apps and model inference. It supports serving any model format/runtime and custom Python code, offering the key primitives for serving optimizations, task queues, batching, multi-model chains, distributed orchestration, and multi-GPU serving.

🦾 OpenLLM: Self-hosting Large Language Models Made Easy

Run any open-source LLMs (Llama 3.1, Qwen2, Phi3 and more) or custom models as OpenAI-compatible APIs with a single command. It features a built-in chat UI, state-of-the-art inference performance, and a simplified workflow for production-grade cloud deployment.

☁️ BentoCloud: Fast and scalable infrastructure for building and scaling with BentoML on the cloud

BentoCloud is the complete platform for enterprise AI teams to build and scale Compound AI systems. It brings cutting-edge AI infrastructure into your cloud environment, enabling AI teams to run inference with unparalleled efficiency, rapidly iterate on system design, and effortlessly scale in production with full observability.

Get in touch 💬

👉 Join our Slack community!

👀 Follow us on X @bentomlai and LinkedIn

📖 Read our blog

Pinned Loading

  1. BentoML BentoML Public

    The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

    Python 7k 778

  2. OpenLLM OpenLLM Public

    Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.

    Python 9.8k 619

Repositories

Showing 10 of 90 repositories
  • BentoML Public

    The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

    bentoml/BentoML’s past year of commit activity
    Python 6,989 Apache-2.0 778 153 16 Updated Sep 19, 2024
  • yatai-image-builder Public

    🐳 Build OCI images for Bentos in k8s

    bentoml/yatai-image-builder’s past year of commit activity
    Go 14 9 4 7 Updated Sep 19, 2024
  • bentoml/openllm-models’s past year of commit activity
    Python 10 3 0 0 Updated Sep 19, 2024
  • llm-router Public

    LLM Router Demo

    bentoml/llm-router’s past year of commit activity
    Python 3 1 0 1 Updated Sep 17, 2024
  • BentoVLLM Public

    Self-host LLMs with vLLM and BentoML

    bentoml/BentoVLLM’s past year of commit activity
    Python 62 11 3 3 Updated Sep 17, 2024
  • bentoml/BentoFunctionCalling’s past year of commit activity
    Python 3 1 0 1 Updated Sep 17, 2024
  • BentoSearch Public

    Search with LLM

    bentoml/BentoSearch’s past year of commit activity
    Python 2 0 0 0 Updated Sep 17, 2024
  • OpenLLM Public

    Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.

    bentoml/OpenLLM’s past year of commit activity
    Python 9,759 Apache-2.0 619 22 0 Updated Sep 16, 2024
  • bentoml-unsloth Public

    BentoML Unsloth integration

    bentoml/bentoml-unsloth’s past year of commit activity
    Python 0 0 0 0 Updated Sep 16, 2024
  • unsloth Public Forked from unslothai/unsloth

    Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

    bentoml/unsloth’s past year of commit activity
    Python 0 Apache-2.0 1,098 0 0 Updated Sep 15, 2024