Response speed with semantic searching

Hello, I’m quite new to AI and ChatGPT. I made an application that uses Semantic searching so that my chatbot, that uses ChatGPT’s chat api, will answer as factually as possible. I stored the string data and its vector counterpart in my database. My database is a MySQL database. I found that my chatbot answers pretty slowly. I’m using the chat completion API and streaming the response. I’ve done some research about this and made the following opinions/thoughts/conclusions:

  1. Limit the data being stored.
  2. The OpenAI API responses are naturally slow.
  3. Use a vector database instead of cosine similarity searching manually on my code for MySQL. (Although, I don’t know anything about vector databases and there’s a good chance I can’t switch to a vector database on production, but I don’t know much about the current subject so I’m open to input)

What do you guys think about it?

Hi, automatic knowledge retrieval can indeed make AI smarter about your own information. If working with non-specialized databases, here’s some thoughts of mine:

Keep embeddings vectors in memory: 6kB per vector when stored as float, with efficiencies when implicit list item# corresponds to chunk#;
Use larger chunks = less math searching all;
Parallel database calls. Speculative calls on high similarity results is a possible technique also, before you establish your final top_n results.
See if latency is really your fault by inserting timers and time reports;
Check tier payment level in your API account “limits”. Tier 1 can get hit with slow output models.

1 Like

basically what j said

how many vectors do you have?

it’s very common to keep all your vectors in memory. if you’re using python, it’s trivial to parallelize your vector search - if you have something in the order of ~1000 vectors, you can try that. even if you run your stuff on a potato - most potatoes nowadays have at least 4 cores.

what you can also try is using FAISS with a HNSW index if you have tons and tons of vectors.

if you know your way around docker (even if you don’t, It’s probably worth learning) - spinning up a milvus vector db is super easy. that’s also an option.

some people like pinecone, but I can’t speak to that.

regarding openAI being slow:

that’s unfortunately something that people in lower tiers have reported. I’d say it becomes faster as you move up in your usage tiers, but unfortunately there’s probably no guarantee here. In my experience the gpt-4-1106 stuff is plenty fast, but apparently your mileage really does vary. Telling you to “just spend a little more money” leaves a bad taste in my mouth. One thing you could try is using the microsoft stuff, to see if you’re getting the performance you need.

1 Like