Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak in vectorStores.fileBatches.uploadAndPoll #1052

Open
1 task done
mezozawahra opened this issue Sep 9, 2024 · 3 comments
Open
1 task done

Memory Leak in vectorStores.fileBatches.uploadAndPoll #1052

mezozawahra opened this issue Sep 9, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@mezozawahra
Copy link

Confirm this is a Node library issue and not an underlying OpenAI API issue

  • This is an issue with the Node library

Describe the bug

I use S3 Bucket to getObject as stream and then upload it to a vector store using vectorStores.fileBatches.uploadAndPoll, the memory usage is supposed to increase during the upload process and return to its baseline after finishing but it doesn't, I uploaded a 22mb file and memory usage went from 300mb to 360mb and never returned to baseline
Note - 1: the upload process is part of POST request in Express.js, and memory usage remained high even after returning the json response
Note - 2 I tried saving the file to disk using "fs", it consumed memory and returned back to baseline, so I think the bug is in OpenAI SDK

To Reproduce

  • this is the ExpressJS request handler, and req.params.id is the the file key saved in s3 bucket
  • the file is uploaded successfully to OpenAI
import OpenAI, { toFile } from "openai";
import { S3Client, GetObjectCommand } from "@aws-sdk/client-s3";
import "dotenv/config";

export async function post(req:Request , res: Response, next: NextFunction){
    try {

        let client = new S3Client({
            forcePathStyle: true,
            credentials: {
                accessKeyId: process.env.S3_ACCESS_KEY_ID!,
                secretAccessKey: process.env.S3_SECRET_ACCESS_KEY!,
            },
            endpoint: process.env.S3_ENDPOINT!,
            region: "us-east-1"
        })

        const params = {
            Bucket: "first-bucket",
            Key: req.params.id,
        }

        const command = new GetObjectCommand( params );
        const response = await client.send( command );
        
        const openai = new OpenAI({
            apiKey: process.env['OPENAI_API_KEY'],
        });

        await openai.beta.vectorStores.fileBatches.uploadAndPoll(<insert vector store ID here>, { files: [await toFile(response.Body!, req.params.id)]})
        
        return res.json({success: true})
    } catch(error){
        next(error)
    }
}
import { post } from "./controller/files"
router.get("/:id", post)

Code snippets

No response

OS

Windows 11

Node version

v19.8.1

Library version

4.56.0

@mezozawahra mezozawahra added the bug Something isn't working label Sep 9, 2024
@RobertCraigie
Copy link
Collaborator

Thanks for the report, could you try again with the latest SDK version? We recently fixed a separate bug with file uploads so this might've been fixed as well

@mezozawahra
Copy link
Author

mezozawahra commented Sep 9, 2024

@RobertCraigie I tried the latest version (^4.58.1), the memory consumption is reduced, instead of using 60mb it used 44mb, I tried POST request twice on ExpressJS twice ( uploadAndPoll twice ), so the webapp memory usage reached 390mb and stayed on that level

update I tried it again , here's a before and after upload memory usage
Before
before
After
after

@RobertCraigie
Copy link
Collaborator

RobertCraigie commented Sep 9, 2024

Thanks, we'll investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants