Do you use streaming? Any difficulties with that?

Do you use API streaming? Any difficulties with that?

I personally love streaming as it lets me show results straight away and decrease the perceived latency.

But I found it a bit tedious to implement.

We were calling OpenAI from a GCP cloud function. And it doesn’t support any way of streaming responses (neither Transfer-encoding: chunked, nor server-side events, nor web sockets).

So we had to use deploy our streaming service to Cloud Run.

Then we also return JSON data instead of text. And when you stream a response you won’t get a correctly formed JSON, so JSON.parse() would fail. Luckily I discovered an optimistic JSON parser (best-effort-json-parser on NPM )that solved this problem.

Then I thought there could be a better way and I packaged it as a JavaScript SDK and a cloud service (https://aistream.dev/), so that you don’t need to build anything to use streaming. You can find a demo and code examples on the website (link in the previous sentence).

But… I’m not sure if it’s really a big problem. People seem to be happily using streaming already.

What do you guys think? Is that type of the SDK/service something useful* or would you build your own streaming?

  • bear in mind it’s not production ready, as you would expose your Open AI key to the public.

Yeah, streaming is a big problem for a lot of people, and whenever I check NPM, there’s someone rolling out their own way of handling streaming. There’s a popular conversation about streaming from the OpenAI community actually, with different implementations of streaming (and I’ve tried out most of them, and they do have shortcomings).

So yeah, I’d say it’s a pretty big problem. Incidentally, using OpenAI streaming with LangChain makes it a breeze to implement — you now have to worry about stream jank on the destination/client.

Really love the landing page for AI Stream dot dev! Especially as users can try it directly in-browser.

PS: email validation isn’t implemented — without submitting an email address, you can click the button and receive the ‘Thanks for your interest!’ message.

Hi,

Thank you! I’ve abandoned the idea though. It didn’t generate enough interest. Plus, packaging it as a cloud service could create security risks - you have to expose an API key in the frontend.

That said, I’ve recently released a library that helps to process streamed JSON - npmjs.com/package/http-streaming-request

1 Like

I have implemented an OpenAI API compatible proxy in AWS Lambda, using a streaming function.

You can switch from OpenAI to Mistral interchangeably, by updating the server prefix in the API Gateway url.

Lambda still buffers some of the chunks for performance reason. The reactivity was still pretty good in the frontend, despite lambda not flushing every single intermediate chunks.

Feel free to deploy your own Lambda using Github repo called lambda-openai-proxy. Let me know any feedback …