Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MultiOpenAIVectorizer to allow general openai api format embeddings to be used for DSPY RM #1240

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hawktang
Copy link
Contributor

@hawktang hawktang commented Jul 3, 2024

With LiteLLM proxy, the embedding model that LiteLLM supported can be used in DSPY

@arnavsinghvi11
Copy link
Collaborator

Thanks for opening this PR @hawktang . just curious, where is the LiteLLMVectorizer being used? seems to me it is just using the openAI embeddings but just wanted to double-check here.

@hawktang
Copy link
Contributor Author

Sorry for the late relay, I am on travel now.

LiteLLMVectorizer is used to call embedding models which LiteLLM proxy supported.

LiteLLM proxy is used to call different LLM APIs using the OpenAI API format.

With LiteLLM adaptor DSPY can direct support all the cloud and local model LiteLLM supports.

Because it is using OpenAI API format, LiteLLMVectorizer I wrote is quite similar except the base_url

Directly add base_url as parameter for openAI embeddings can achieve the result.

I will raise another PR if LiteLLMVectorizer class is redundant.

@okhat
Copy link
Collaborator

okhat commented Aug 19, 2024

See #1357 also, may overlap?

@hawktang
Copy link
Contributor Author

hawktang commented Aug 20, 2024 via email

@hawktang hawktang changed the title add LiteLLMVectorizer to allow more embeddings to be used add MultiOpenAIVectorizer to allow general openai api format embeddings to be used Sep 3, 2024
@hawktang hawktang changed the title add MultiOpenAIVectorizer to allow general openai api format embeddings to be used Add MultiOpenAIVectorizer to allow general openai api format embeddings to be used Sep 3, 2024
@hawktang
Copy link
Contributor Author

hawktang commented Sep 3, 2024

I have change the name to MultiOpenAIVectorizer to follow the new MultiOpenAI api in dspy. Can we merge this PR for general embedding service to be use in dspy RM.

This should be a quick solution for LM and RM to use LiteLLM before the roadmap finished.


As of DSPy 2.4, the library has approximately 20,000 lines of code and roughly another 10,000 lines of code for tests, examples, and documentation. Some of these are clearly necessary (e.g., DSPy optimizers) but others exist only because the LM space lacks the building blocks we need under the hood. Luckily, for LM interfaces, a very strong library now exists: LiteLLM, a library that unifies interfaces to various LM and embedding providers. We expect to reduce around 6000 LoCs of support for custom LMs and retrieval models by shifting a lot of that to LiteLLM.

Objectives in this space include improved caching, saving/loading of LMs, support for streaming and async LM requests. Work here is currently led by Hanna Moazam and Sri Vardhamanan, building on a foundation by Cyrus Nouroozi, Amir Mehr, Kyle Caverly, and others.

#1357
#390

@hawktang hawktang changed the title Add MultiOpenAIVectorizer to allow general openai api format embeddings to be used Add MultiOpenAIVectorizer to allow general openai api format embeddings to be used for DSPY RM Sep 3, 2024
@hawktang
Copy link
Contributor Author

I have change the name to MultiOpenAIVectorizer to follow the new MultiOpenAI api in dspy. Can we merge this PR for general embedding service to be use in dspy RM.

This should be a quick solution for LM and RM to use LiteLLM before the roadmap finished.

Any feedback for the update of the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants