I’m working with the Lambda function URLs solution suggested by @curt.kennedy for my own use case, but the function output isn’t what I’m expecting. @dmiskiew, did you get your solution to a point (or was it part of your use case) where the Lambda response stream chunks were identical to the stream given by the OpenAI API?
My issue is that, when calling the Lambda function URL, I’m getting chucks that are made up of multiple OpenAI response chunks (and often times one OpenAI chunk is split across two Lambda response chunks making it impossible to parse the data from my client). For example, here are some chunks I’m receiving from my Lambda function response:
Chunk #1:
{"id":"chatcmpl-8SY6q514tz0hzLJids92OS7Lv8Q0E","object":"chat.completion.chunk","created":1701814992,"model":"gpt-3.5-turbo-0613","system_fingerprint":null,"choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}{"id":"chatcmpl-8SY6q514tz0hzLJids92OS7Lv8Q0E","object":"chat.completion.chunk","created":1701814992,"model":"gpt-3.5-turbo-0613","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"A"},"finish_reason":null}]}{"id":"chatcmpl-8SY6q514tz0hzLJids92OS7Lv8Q0E","object":"chat.completion.chunk","created":1701814992,"model":"gpt-3.5-turbo-0613","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"ye"},"finish_reason":null}]}{"id":"chatcmpl-8SY6q514tz0hzLJids92OS7Lv8Q0E","object":"chat.completion.chunk","created":1701814992,"model":"gpt-3.5-turbo-0613","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":","},"finish_reason":null}]}{"id":"chat
Chunk #2
cmpl-8SY6q514tz0hzLJids92OS7Lv8Q0E","object":"chat.completion.chunk","created":1701814992,"model":"gpt-3.5-turbo-0613","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" I"},"finish_reason":null}]}{"id":"chatcmpl-8SY6q514tz0hzLJids92OS7Lv8Q0E","object":"chat.completion.chunk","created":1701814992,"model":"gpt-3.5-turbo-0613","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" have"},"finish_reason":null}]}{"id":"chatcmpl-8SY6q514tz0hzLJids92OS7Lv8Q0E","object":"chat.completion.chunk","created":1701814992,"model":"gpt-3.5-turbo-0613","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" come"},"finish_reason":null}]}{"id":"chatcmpl-8SY6q514tz0hzLJids92OS7Lv8Q0E","object":"chat.completion.chunk","created":1701814992,"model":"gpt-3.5-turbo-0613","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":" across"}
… etc. until the whole response is returned. When I’m looking for individual chunked objects like what OpenAI returns, e.g:
Chunk #1
{"id":"chatcmpl-8SY6q514tz0hzLJids92OS7Lv8Q0E","object":"chat.completion.chunk","created":1701814992,"model":"gpt-3.5-turbo-0613","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"A"},"finish_reason":null}]}
Chunk #2
{"id":"chatcmpl-8SY6q514tz0hzLJids92OS7Lv8Q0E","object":"chat.completion.chunk","created":1701814992,"model":"gpt-3.5-turbo-0613","system_fingerprint":null,"choices":[{"index":0,"delta":{"content":"ye"},"finish_reason":null}]}
Here is my Lambda function code:
import OpenAI from 'openai';
import util from 'util';
import stream from 'stream';
const { Readable, Transform } = stream;
const pipeline = util.promisify(stream.pipeline);
/* global awslambda */
export const handler = awslambda.streamifyResponse(
async (event, responseStream, _context) => {
const body = JSON.parse(event.body);
const openai = new OpenAI({
apiKey: 'xxx',
});
const response = await openai.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: body.messages,
stream: true,
});
const requestStream = Readable.from(stream);
// safely pipe the OpenAI stream to the function response stream
await pipeline(requestStream, responseStream);
}
);
etc. Haven’t been able to figure this one out, but I’m new to streaming with Node, so maybe there’s something obvious I’m missing.