Can we convert a long system message to embeddings once and send that over during each api call?

Sending the same long message each time can get costly and add to latency. Can we just convert them to embeddings once and send over each time instead of raw text?

No. There is no such API.

The internal state is likely to be ridiculously huge, and also proprietary.

Were there some hash table and lookup to preload context calculation, facilitated by the messages format, it would be for OpenAI’s benefit. The kind of thing that could be pursued to optimize ChatGPT Plus with its large system context used millions of times a day.

For you, reducing input is by fine-tune of models.

1 Like