-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming responses #3
Comments
Hi, apologies for the wait, I was super swamped with work! TL;DR is that it would be difficult to handle past sending to the server due to openai-node not supporting event streams; it's only Python afaik, and this repo is written in Node.js. There's a great thread at openai/openai-node#18 that explains it. Basically, parsing the return data is extremely complicated, especially for a POST request, and there doesn't yet seem to be a good way not to have it all chunked when it's actually sent to the frontend, which kinda defeats the purpose.
So far, no one in this community thread has a great solution to the problem: https://community.openai.com/t/how-to-stream-response-in-javascript/7310/20 While you can always add it to the parameters next to temperature and everything in According to the Docs:
From the thread:
Also, I've already been handling everything in a messageContent field, so some pretty severe modifications would be needed to change variable names, parsing and the like, as according to the OpenAI CookBook, Streaming Completions uses a This is the snippet that would be used on this repo that currently works when I tested it, but doesn't actually stream to the frontend, and that I'm wary of implementing due to the image endpoint being a bit scuffed: try {
const response = await axios.post('https://api.openai.com/v1/chat/completions', data, { headers, responseType: 'stream' });
let buffer = '';
response.data.on('data', (chunk) => {
buffer += chunk.toString(); // Accumulate the chunks in a buffer
});
response.data.on('end', () => {
try {
const lines = buffer.split('\n');
let messageContent = '';
for (const line of lines) {
if (line.startsWith('data: ')) {
const jsonString = line.substring(6).trim();
if (jsonString !== '[DONE]') {
const parsedChunk = JSON.parse(jsonString);
messageContent += parsedChunk.choices.map(choice => choice.delta?.content).join('');
}
}
}
const lastMessageContent = messageContent;
if (lastMessageContent) {
// Add assistant's message to the conversation history
conversationHistory.push({ role: "assistant", content: lastMessageContent.trim() });
// Send this back to the client
res.json({ text: lastMessageContent.trim() });
} else {
// Handle no content scenario
res.status(500).json({ error: "No text was returned from the API" });
}
} catch (parseError) {
console.error('Error parsing complete response:', parseError.message);
res.status(500).json({ error: "Error parsing the response from OpenAI API" });
}
});
} catch (error) {
console.error('Error calling OpenAI API:', error.message);
if (error.response) {
console.error(error.response.data);
}
res.status(500).json({ error: "An error occurred when communicating with the OpenAI API.", details: error.message });
}
}); Am I correct in thinking that your motivation to have streamed responses would be so that the completions appear in chunks like on the ChatGPT interface but are sent to the conversation history in a single response, and are also sent server-side in this manner? Let me know if you'd like to fork it yourself, or if I should otherwise close the issue, since I believe their documentation doesn't address anything like replicating what they have on their Chat interface exactly, only this example: import OpenAI from "openai";
const openai = new OpenAI();
async function main() {
const completion = await openai.chat.completions.create({
model: "gpt-3.5-turbo",
messages: [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
stream: true,
});
for await (const chunk of completion) {
console.log(chunk.choices[0].delta.content);
}
}
main(); Which even when parsed and sent to the frontend individually, would not be fast enough to make much of a meaningful difference; the chat completions chunk is described here and would mean quite the workaround.
Edit: You may want to look into The end of the thread has some suggestions for implementation in V4, so maybe something like this could work, but I'm not too sure about the feasibility for something so small, sorry. Apologies if I've missed anything obvious; I'm new to this, but feel free to submit a PR if you figure it out! @mihau12 |
Hello! Thanks for your work! Is it possible to add stream=true parameter to have streamed responses?
The text was updated successfully, but these errors were encountered: