[Feature] CallbackManager with onLLMStream and onRetrieve #8

ajamjoom · 2023-07-05T05:27:23Z

Description

Created a CallbackManager module with two optional functions onLLMStream and onRetrieve. This module gets passed into the serviceContext, and then each callback function gets called at the appropriate location.

Here's a basic usage pattern:

    const callbackManager = new CallbackManager({
      onLLMStream: (data) => {
        // TODO
      },
      onRetrieve: (data) => {
        // TODO
      },
    });
 
    const serviceContext = serviceContextFromDefaults({
      callbackManager
    });

    index = await VectorStoreIndex.fromDocuments(
      [document],
      undefined,
      serviceContext
    );

`onLLMStream`

Defining this callback auto sets the openAI LLM to start streaming the response back through the callback. This will not change the return value of the agenerate function, the final result will still be returned as expected (as an LLMResult object), though during the generation process the available tokens will be streamed through the callback.

Processing the generator object and parsing it's values into chunks, then streaming those chunks through the callback has two benefits:

Abstract away the generator's required parsing logic (more on this here)
Maintain a single simple return type for agenerate that always includes the final generated text

The streamed results have the following properties:

They have a trace object, which currently only includes an id and a parentId. The parentId can link operations together, for example, if you use the query engine, the parentId for the stream and retrieved data callbacks would be the same. This allow you to accurately identity the relationships between the callback responses.
An index is added. This is useful for when users want to stream to the client, where in some setups the order of dispatched objects from the server will not be guaranteed to be received by the client at the same order. So adding an index on the object gives users the ability to fix any ordering issues on the client if needed.

Con for this setup:
This is different than how streaming is implemented in the python library

`onRetrieve`

onRetrieve allows us to instantly expose the retrieved list of scored nodes prior to finishing synthesis. This could be useful for allowing users to show a faster feedback loop and start maybe rending snippets of relevant nodes while synthesis is still running.

When you couple both onLLMStream and onRetrieve, you can offer a much snappier user experience.

Approach limitations

The main limitation right now is that onLLMStream doesn't distinguish between final LLM calls and intermediate calls. It's useful to stream all these tokens to the onLLMStream if the user wants to do so, but it does feel potentially confusing as the default experience. I bet by default users only want to see the stream of the final LLM call and disregard the intermediate ones.

I decided to punt on this fix in this PR as it's getting too large. I can follow-up on this.

Type of Change

New feature (non-breaking change which adds functionality)
This change requires a documentation update (@yisding are we tracking required documentation updates somewhere?)

How Has This Been Tested?

Added basic integration tests: packages/core/src/CallbackManager.test.ts.

Test limitations: I've created an openAI mock for creating embeddings and generating LLM completions so that we can run integration tests without having to pay for the real API. This is good, but it also means that we aren't actually testing the real API calls which may eventually get updated with breaking changes.

TODO: look more into mocking the openAI API...explore if there are existing well maintained libraries that offer this service.

Added new unit/integration tests that use an openAI mock
I ran the unit-tests without the openAI mock to ensure that it also works there
I stared at the code and made sure it makes sense

Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…esponses

yisding · 2023-07-05T15:42:49Z

Thanks for doing this Abdul! The code looks good but let's find some time to talk about this today. Since I'm very unfamiliar with the CallbackManager as a whole I'd like to figure out how we can use it across the package to enable streaming in all the places the user expects it.

yisding

Thanks for doing this. This looks good.

ajamjoom added 4 commits July 4, 2023 09:27

unstable checkpoint

2a038c0

Merge branch 'main' of github.com:run-llama/llamascript into stream_r…

5487de8

…esponses

Create the CallbackManager with onLLMStream and onRetrieve

9681094

polish

35a6795

ajamjoom marked this pull request as ready for review July 5, 2023 05:41

ajamjoom requested review from yisding, sourabhdesai, jerryjliu and Disiok July 5, 2023 05:41

ajamjoom added 3 commits July 9, 2023 16:31

add callback and event tracking

81e2258

updated tests

5d8d344

clean-up

0bec460

yisding approved these changes Jul 10, 2023

View reviewed changes

yisding merged commit 2f468ab into main Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] CallbackManager with onLLMStream and onRetrieve #8

[Feature] CallbackManager with onLLMStream and onRetrieve #8

ajamjoom commented Jul 5, 2023 •

edited

Loading

yisding commented Jul 5, 2023

yisding left a comment

[Feature] CallbackManager with onLLMStream and onRetrieve #8

[Feature] CallbackManager with onLLMStream and onRetrieve #8

Conversation

ajamjoom commented Jul 5, 2023 • edited Loading

Description

onLLMStream

onRetrieve

Approach limitations

Type of Change

How Has This Been Tested?

Checklist:

yisding commented Jul 5, 2023

yisding left a comment

Choose a reason for hiding this comment

ajamjoom commented Jul 5, 2023 •

edited

Loading

`onLLMStream`

`onRetrieve`