-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(dspy): Experiment with adding image data with GPT-4o and Gemini #1099
base: main
Are you sure you want to change the base?
Conversation
0f3b550
to
2ec86d8
Compare
Perfect, would love for this to get slapped in main. Stumbled onto this repo from someone else and this is exactly what I am looking for. |
Could we maybe add it to azure_open_ai too ` Add image data if provided in kwargs
` |
Nice @Biebrya thanks for providing the Azure OpenAI support! I just added your suggestions to the branch 🚀 Although this is functional, but probably missing some docs/tests. Still, going to open this up for review and looking to hear from maintainers on the approach of this API - there are a few ways to do this suggested in #624. Will be good to do a temperature check to see if this is a viable approach, and happy to follow up with tests/docs, if this is okay. |
Google Gemini API supports prompts with multiple images. The below changes could be added to this PR to support the list of image input.
|
I have been looking all over for DSPy image support! Thank you, hopefully this gets added to main soon 🙏 |
It seems there are some merge conflicts now. Are the maintainers interested in this pull request? Happy to help with resolving the conflicts if there is potential for this to be merged. |
Add support for Vision data for various LLM vendor (Gemini, GPT, Azure OpenAI GPT).
This implements feature requested in #624
This adds the
is_image
property toInputField
. We expect this image data to be encoded in Base64 JPG - this is the format expected the vendors listed above, and also will be easy to serialize / transform to other format.This should be compatible with existing APIs and should not cause breakage.
To manually test this, run:
Design notes
is_audio
to handle other modal).MIPro
optimizer is updated to allow theexample_stringify_fn
since otherwise those images is going to take up large amount of input context for themipro_optimizer.DatasetDescriptor
signature and run out of context. See below.For
MIRPO
: Usingexample_stringify_fn
signature to handle large contextMost LLM only understand images which is passed in a separate content chunk, which makes referencing the example image within prompt difficult. Hypothesising one way to help is to try to provide some alt text and prompting models to describe the alt text during the chain of thought (such alt-text can be generated potentially in previous automated steps).
Also that with an image example data (which is large and frequently exceed context window limit) - it can cause MIPRO prompt to go over the context window size.
Therefore we provide the
example_stringify_fn
function which allow custom expressing the example.Now you can call MIPRO with: