[Feature] CallbackManager with onLLMStream and onRetrieve #8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Created a
CallbackManager
module with two optional functionsonLLMStream
andonRetrieve
. This module gets passed into theserviceContext
, and then each callback function gets called at the appropriate location.Here's a basic usage pattern:
onLLMStream
Defining this callback auto sets the openAI LLM to start streaming the response back through the callback. This will not change the return value of the
agenerate
function, the final result will still be returned as expected (as anLLMResult
object), though during the generation process the available tokens will be streamed through the callback.Processing the generator object and parsing it's values into chunks, then streaming those chunks through the callback has two benefits:
agenerate
that always includes the final generated textThe streamed results have the following properties:
id
and aparentId
. TheparentId
can link operations together, for example, if you use the query engine, theparentId
for the stream and retrieved data callbacks would be the same. This allow you to accurately identity the relationships between the callback responses.Con for this setup:
This is different than how streaming is implemented in the python library
onRetrieve
onRetrieve
allows us to instantly expose the retrieved list of scored nodes prior to finishing synthesis. This could be useful for allowing users to show a faster feedback loop and start maybe rending snippets of relevant nodes while synthesis is still running.When you couple both
onLLMStream
andonRetrieve
, you can offer a much snappier user experience.Approach limitations
The main limitation right now is that
onLLMStream
doesn't distinguish between final LLM calls and intermediate calls. It's useful to stream all these tokens to theonLLMStream
if the user wants to do so, but it does feel potentially confusing as the default experience. I bet by default users only want to see the stream of the final LLM call and disregard the intermediate ones.I decided to punt on this fix in this PR as it's getting too large. I can follow-up on this.
Type of Change
How Has This Been Tested?
Added basic integration tests:
packages/core/src/CallbackManager.test.ts
.Test limitations: I've created an openAI mock for creating embeddings and generating LLM completions so that we can run integration tests without having to pay for the real API. This is good, but it also means that we aren't actually testing the real API calls which may eventually get updated with breaking changes.
TODO: look more into mocking the openAI API...explore if there are existing well maintained libraries that offer this service.
Checklist: