I am seeking assistance regarding an issue I encountered while integrating ChatGPT into my mobile app, which is a flyer design application. I need guidance on how to handle long conversations and token limits efficiently. As the conversation history grows, it becomes challenging to include the entire history in each API call due to token constraints. Truncating or summarizing the conversation affects context, and message-level pagination adds complexity. I am looking for best practices to manage long conversations while considering user context, strategies to handle token usage effectively, and any recommended approaches or techniques to overcome token limitations in a mobile app context. Your expert guidance would greatly help me improve the user experience within my app.
ChatGPT uses a simple rolling window with a first in first out buffer. I don’t think there is a more token efficient or cost effective way of doing it, if you try to summarise you will simply use tokens up in the summarization each time.
Perhaps take advantage of the new GPT3.5 16k model to give users longer contexts, but ultimately the problem of context size and conversation context maintenance is a large and complex area of study with no simple answers.
If you are stating that as a simplification then I have no disagreement as long as you note is a simplification but if you are stating that as how it actually works then please cite a reference.
My understanding is that a summary of the each prompt and completion is passed as part of the next prompt so if something important appears in the first prompt it may still be used in a prompt after several other prompts while something used in the previous prompt may not make it into the summary and thus the last prompt seems to not know about it.
please check my this JSON in request
{
“messages”: [
{“role”: “system”, “content”: “You are a helpful assistant.”},
{“role”: “user”, “content”: “What’s the best way to design a flyer for a summer event?”},
{“role”: “assistant”, “content”: “You can start by selecting vibrant colors and incorporating summer-themed elements like sun, beach, or BBQ.”},
{“role”: “user”, “content”: “Should I include a QR code on the flyer for ticket registration?”},
{“role”: “assistant”, “content”: “Including a QR code can be a convenient way for attendees to register. Make sure it’s placed prominently and easily scannable.”},
{“role”: “user”, “content”: “What are some good font choices for a professional flyer?”},
{“role”: “assistant”, “content”: “For a professional flyer, you can consider using clean and modern fonts like Helvetica, Arial, or Montserrat.”},
{“role”: “user”, “content”: “How should I arrange the text and images on the flyer?”},
{“role”: “assistant”, “content”: “A balanced layout with clear hierarchy is essential. Place important information prominently and use images that complement the overall design.”}
]
}
as per user ask the question every time append the previous chat
and this increase the cost
is there any alternative of this without send the previous message only send any id of this conversion and give the next response base on this
That would require that each prompt is processed once for completion along with past context and then processed again to extract a “summery” this would require 2x processing per query with a system that is extremely loaded with users, additionally ChatGPT is a general purpose AI so does not know if it is being given code or data sets or literature, it would be unwise to attempt to summarise technical data or code and expect coherence across prompts. The context window is a known size, 4k , 8k, etc, one can create test prompts that contain information at the top of padding text, the block that will roll off exactly when a new prompt is generated, this can be tested and consistently looses context as would be expected with a fixed rolling window.
There are no shortcuts when creating fully coherent context, you can summarise past chats but you may loose important information if the model does not see something as important that you later rely on, or you can just include all prompts and replies verbatim as you have in your example.
Just use a vector database. It is not perfect for all use cases, but is cheap, fast, and accurate (usually).