Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] CallbackManager with onLLMStream and onRetrieve #8

Merged
merged 7 commits into from
Jul 10, 2023

Conversation

ajamjoom
Copy link
Contributor

@ajamjoom ajamjoom commented Jul 5, 2023

Description

Created a CallbackManager module with two optional functions onLLMStream and onRetrieve. This module gets passed into the serviceContext, and then each callback function gets called at the appropriate location.

Here's a basic usage pattern:

    const callbackManager = new CallbackManager({
      onLLMStream: (data) => {
        // TODO
      },
      onRetrieve: (data) => {
        // TODO
      },
    });
 
    const serviceContext = serviceContextFromDefaults({
      callbackManager
    });

    index = await VectorStoreIndex.fromDocuments(
      [document],
      undefined,
      serviceContext
    );

onLLMStream

Defining this callback auto sets the openAI LLM to start streaming the response back through the callback. This will not change the return value of the agenerate function, the final result will still be returned as expected (as an LLMResult object), though during the generation process the available tokens will be streamed through the callback.

Processing the generator object and parsing it's values into chunks, then streaming those chunks through the callback has two benefits:

  1. Abstract away the generator's required parsing logic (more on this here)
  2. Maintain a single simple return type for agenerate that always includes the final generated text

The streamed results have the following properties:

  1. They have a trace object, which currently only includes an id and a parentId. The parentId can link operations together, for example, if you use the query engine, the parentId for the stream and retrieved data callbacks would be the same. This allow you to accurately identity the relationships between the callback responses.
  2. An index is added. This is useful for when users want to stream to the client, where in some setups the order of dispatched objects from the server will not be guaranteed to be received by the client at the same order. So adding an index on the object gives users the ability to fix any ordering issues on the client if needed.

Con for this setup:
This is different than how streaming is implemented in the python library

onRetrieve

onRetrieve allows us to instantly expose the retrieved list of scored nodes prior to finishing synthesis. This could be useful for allowing users to show a faster feedback loop and start maybe rending snippets of relevant nodes while synthesis is still running.

When you couple both onLLMStream and onRetrieve, you can offer a much snappier user experience.

Approach limitations

The main limitation right now is that onLLMStream doesn't distinguish between final LLM calls and intermediate calls. It's useful to stream all these tokens to the onLLMStream if the user wants to do so, but it does feel potentially confusing as the default experience. I bet by default users only want to see the stream of the final LLM call and disregard the intermediate ones.

I decided to punt on this fix in this PR as it's getting too large. I can follow-up on this.

Type of Change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update (@yisding are we tracking required documentation updates somewhere?)

How Has This Been Tested?

Added basic integration tests: packages/core/src/CallbackManager.test.ts.

Test limitations: I've created an openAI mock for creating embeddings and generating LLM completions so that we can run integration tests without having to pay for the real API. This is good, but it also means that we aren't actually testing the real API calls which may eventually get updated with breaking changes.

TODO: look more into mocking the openAI API...explore if there are existing well maintained libraries that offer this service.

  • Added new unit/integration tests that use an openAI mock
  • I ran the unit-tests without the openAI mock to ensure that it also works there
  • I stared at the code and made sure it makes sense

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@ajamjoom ajamjoom marked this pull request as ready for review July 5, 2023 05:41
@yisding
Copy link
Contributor

yisding commented Jul 5, 2023

Thanks for doing this Abdul! The code looks good but let's find some time to talk about this today. Since I'm very unfamiliar with the CallbackManager as a whole I'd like to figure out how we can use it across the package to enable streaming in all the places the user expects it.

Copy link
Contributor

@yisding yisding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. This looks good.

@yisding yisding merged commit 2f468ab into main Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants