Any thoughts about sending diffs on API

alex_gendelman · April 21, 2023, 6:43pm

Hi all, since token limit has grown so much, any chance we’ll be able to send ‘diff’ requests instead of full requests? So instead of a 3000 token request, i’d just send the delta from previous?
Now that we’re dealing with chat models, that receive information in “messages” list, it could be great to just send the appended message, and not the whole thing

sps · April 21, 2023, 6:49pm

Hi @alex_gendelman

Welcome to the community.

iMO such functionality will require a storage solution on behalf of OpenAI. I think you can implement it on your end by creating a proxy with a custom storage solution giving the session an ID to implement it.

Also, chat completion endpoint can also be used for completions as well, implementing a diffs infra would cost a lot for storage and for no good reason. Not to mention the fact that it’ll add to the token cost.

alex_gendelman · April 21, 2023, 7:06pm

The objective would be reducing roundtrip time by not sending the entire package every time. If i don’t need those tokens, I’d just send another message

So implementing this on my side won’t help too much

curt.kennedy · April 21, 2023, 7:58pm

By far the most time spent is on inference, not sending data. Also, with a ‘diff’ scheme, I’m not sure how I would control the history that the AI is sending through the transformer network. What if I wanted the AI to suddenly forget something in the recent past? For example, the topic changes. Not controlling this would drive me nuts!

alex_gendelman · April 22, 2023, 4:31am

This is all true, but use cases vary. Also, there’s a number of ways to allow diffs, starting from purely API with caching on OpenAI side, which wouldn’t reduce inference time but would reduce the call time nonetheless, and also through saving partial states in the encoder, which would definitely save some inference time.
So there are approaches to reduce inference time as well.
A less naive approach could be detecting the similarities in prompt on OpenAI side, rather than user side, and just reuse the weights on sequential calls, to some degree.

As for use cases, I see quite a lot of ppl doing chat agents with a large window of context. So the window is obviously sliding, but one could design a simple architecture to slide the actual window in jumps, and if diffs are possible, reduce the roundtrip if that were possible.

Not trivial implementation, but for chat applications I think it could be useful.

qrdl · May 21, 2023, 12:37am

This is actually a huge problem, but I think it’s likely unsolvable due to the nature of LLMs.

I’ve tried a number of things and my conclusion is that reasoning degrades significantly when tasked with thinking in ‘diffs’. They work much better when they can do the next word prediction on complete snippets.

If anyone has found a way to make this work as well as regular prompts that would be great. Love to be proven wrong here because the lack of diffs in chatting/utilizing content with LLMs a lot more complicated.

alex_gendelman · June 1, 2023, 6:12pm

Looks like OpenAI do have some plans to help us with that.

A stateful API is planned according to Altman.

Topic		Replies	Views
Efficient stateful completion chatbot API	10	4859	July 9, 2024
Is it possible to reuse previous chat history on the OpenAI side to avoid sending repetitive tokens? API	5	2609	January 11, 2024
Introducing Predicted Outputs Announcements	15	7202	November 18, 2024
Feature Request: Enhanced Diff Format Support in ChatGPT for Streamlined Code Integration Prompting github , gpt-4 , developers	9	2439	January 31, 2024
The cumulative token problem and role = system usage, options? API	9	4067	February 16, 2024

Any thoughts about sending diffs on API

Related topics