Cacheing functions listing

mrodgers.junk · August 18, 2023, 3:07am

The guys at PromptMule.com have a cache-as-a-service capability that is hosted and supports OpenAI api calls, that has free trials now. Check it out

jochenschultz · August 18, 2023, 4:49am

I’d say it depends on the kind of data you want to cache.

Let’s say you use the model to translate users request into something your backend understands.

Then you don’t want to store the users request on a per phrase to response logic and you also might not want to cache the data related like this:

“question of the user” : “some command for the backend - #1”
“another question of the user with the same intent” : “the same command for the backend but with #2”

but rather like this:

“question of the user” : “some command for the backend - #1”
“another question of the user with the same intent” : “same command for the backend - #1”

But for other use cases you may as well just store let’s say a similarity check for a on keyword density built matrix or even better Jaccard similarity coefficient.

It also depends on if you want to find something based on keywords, want to order the data that could be responded,… so many options and no “this solution fits all” - like hey, why don’t you just connect this vectordb or why don’t you use a hybrid of vectordb + elastic search + mongodb + rdbms + graphdb… it really depends on your data.

https://chat.openai.com/share/4959547c-42c4-421e-878b-6ec345213bcb

Wow, i am really impressed how chatgpt found my humor in that. Many people can’t without an emoji.

jay9 · August 18, 2023, 2:37pm

curious to learn more about your use case. how high throughput of a system are you building?

mrodgers.junk · August 21, 2023, 10:34pm

Thank you for the detail Jochen. I agree there is nuance to how you build the cache and use it depending on the use case the app is implementing. For my use case it is fairly simple, I’m building a demonstration of a copy writing tool that will assist writers with writing for ads, tag lines, etc. So I except there will be some repetition but that nuance is key to the design. I will consider this as I look closer at it.

mrodgers.junk · August 21, 2023, 10:39pm

Jay, this is a fairly low throughput system. I do not expect more than about 100 events per minute. Which likely translates to roughly 80-100kbps per user. (100epm/60=1.6eps * 12bytes-per-token * 32k tokens = 80kbps)

Topic		Replies	Views
Do you cache your API results? API	5	3590	July 13, 2023
Does prompt caching reduce TPM? API gpt-4o , prompt-caching	4	179	March 9, 2025
Options for caching same prompt x thousand of requests..? API api	4	2786	June 1, 2024
Short lived memory for chatbot API	1	875	August 22, 2023
Cached Input Tokens in Chat Completions API chatgpt , api , chat-completion , token , cache	1	469	April 30, 2025

Cacheing functions listing

Related topics