Edge Pass 4096 Token Limit. Systematic approach to sent API call

i believe the Stream = True, allow us to send multiple call, as 1 prompt.
currently GPT 3.5Turbo only allowed Context in prompt. with 4096 single limit,
we quickly run into limit as the conversation continue ( if you want GPT to aware the conversation history )

My idea :

  • 1st call = context
  • 2nd call = bot instruction, limitation
  • 3rd and Last call = summarize of previous converstation. ( limited to last 2 hours )

and get response from OpenAI.

But i have a problem.
I m not sure how to use data:[DONE] in my call.

Hopefully any senior coder here could give this stream = true a try. and share with me how to end the stream. Thanks in advance.

I found the more detail doc from githut

I misunderstood. the stream is actual reverse direction

meaning, if the completion is “stream” to your server,
so you can display the result like " ChatGPT " ( typewriter effects )

1 Like


As I tried to explain to you earlier :+1:

Output steaming, not input.


1 Like


the 4096 was like the barrier of Normal Goku between SSJ Goku…

You are learning very fast for a novice coder.

Keep it up!


1 Like

thanks for the compliment. it boost me!!

would like to seek your advice on summarize conversation, and compile into next API call.

  • call GPT3.5-turbo, pass only the conversation history to summarize it.
  • save the summary into my DB - chat_session > summary_so_far
  • next call, include the summary as context.

– only send summarize request on every 5th step ( count the user input entry )

So far in my test, I creating a sales agent, with context also include most FAQ.
even without knowing previous chat, the agent still handle well.
so, I guess do the summary call every 5th step would be enough. and save some token.

What do you think?