Response speed with semantic searching

aeverano63 · December 29, 2023, 1:52am

Hello, I’m quite new to AI and ChatGPT. I made an application that uses Semantic searching so that my chatbot, that uses ChatGPT’s chat api, will answer as factually as possible. I stored the string data and its vector counterpart in my database. My database is a MySQL database. I found that my chatbot answers pretty slowly. I’m using the chat completion API and streaming the response. I’ve done some research about this and made the following opinions/thoughts/conclusions:

Limit the data being stored.
The OpenAI API responses are naturally slow.
Use a vector database instead of cosine similarity searching manually on my code for MySQL. (Although, I don’t know anything about vector databases and there’s a good chance I can’t switch to a vector database on production, but I don’t know much about the current subject so I’m open to input)

What do you guys think about it?

_j · December 29, 2023, 2:05am

Hi, automatic knowledge retrieval can indeed make AI smarter about your own information. If working with non-specialized databases, here’s some thoughts of mine:

Keep embeddings vectors in memory: 6kB per vector when stored as float, with efficiencies when implicit list item# corresponds to chunk#;
Use larger chunks = less math searching all;
Parallel database calls. Speculative calls on high similarity results is a possible technique also, before you establish your final top_n results.
See if latency is really your fault by inserting timers and time reports;
Check tier payment level in your API account “limits”. Tier 1 can get hit with slow output models.

Diet · December 29, 2023, 4:26am

basically what j said

how many vectors do you have?

it’s very common to keep all your vectors in memory. if you’re using python, it’s trivial to parallelize your vector search - if you have something in the order of ~1000 vectors, you can try that. even if you run your stuff on a potato - most potatoes nowadays have at least 4 cores.

what you can also try is using FAISS with a HNSW index if you have tons and tons of vectors.

if you know your way around docker (even if you don’t, It’s probably worth learning) - spinning up a milvus vector db is super easy. that’s also an option.

some people like pinecone, but I can’t speak to that.

regarding openAI being slow:

that’s unfortunately something that people in lower tiers have reported. I’d say it becomes faster as you move up in your usage tiers, but unfortunately there’s probably no guarantee here. In my experience the gpt-4-1106 stuff is plenty fast, but apparently your mileage really does vary. Telling you to “just spend a little more money” leaves a bad taste in my mouth. One thing you could try is using the microsoft stuff, to see if you’re getting the performance you need.

Topic		Replies	Views
Fastest and most precise vector db and LLM Community gpt-4 , azure	4	2387	February 8, 2024
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? API chatgpt , api	4	24958	December 29, 2025
How to speed up OpenAI API calls Community api	31	38089	December 13, 2023
Chatbot for company website to answer product-related questions API assistants-api	1	295	August 30, 2024
Vector search results are too slow API	15	334	December 26, 2025

Response speed with semantic searching

Related topics