Assistant's API speed and hallucination tricks i learned while building my production app

Hi Folks,

I’ve spent the last two months using the assistant’s API to build out www.docmonster.ai and
i wanted to share the best tricks i’ve found to get to somewhat production speed. This is like under 1.2 sec

Context
Model: Gpt -3.5 turbo

Tip 1 for speed:
So all messages have three parts:
Thread,
User message,
run,

Runs have messages on them and are within a thread. My biggest speed improvement came from splitting part one. Basically when the user opens my bot by clicking on the + symbol, it sends a message to the backend to initalise a thread and have it ready. Then when the user send the message, the run is added to the thread. Thread ends when the user closes the bot.

Tip2 for speed:

use promises instead of running them one after the other.
const [sentMessage, run] = await Promise.all([sentMessagePromise, runPromise]);

This made a big difference as well.

Tip3 for speed: More dynos

My backend was running on heroku and i was using one eco dyno. Adding 2 standard dynos made a world of a difference. don’t judge the api on the speed it has on your local server. even without streaming it’s not horrible!

Tip for Hallucinations:
I had to keep gaslighting my bot to make it retrieve from the file retrieve. It would keep saying it was unable to access files and then when i replied with “try again you have access, it works”

I added this to the instructions when my bot is created to solve for this: “If the myfiles_browser tool doesnt work the first time, try again till it works’”

So myfiles_browser tool seems to be the file retrieval tool’s name. So adding this instruction at the creation and the message level has made it stop telling me it couldn’t access my files.

It’s possible this is coincidence and the instruction doesn’t really help, but it’s consistently worked for me.

Anyways, thats it for me. Hope these tips were useful. If you want to try the bot’s speed you can do it on my product for free at www.docmonster.ai to see what i mean. Plus i’m launching what i think may be one of the first few production apps using the assitant’s api so wish me luck!

11 Likes

Thanks for sharing your tips and suggestions with us!

Thanks for sharing!

Noob question: Why did you use the Assistants API instead of the Chat API?

If I understand docmonster’s use case correctly, you’re basically building “ChatGPT over my API docs”. Wouldn’t a simple RAG architecture on top of the Chat API have sufficed?

2 Likes

RAG is included with Assistant. You can use Assistant for that, and that’s what OP did.

Thanks a lot & good luck with Docmonster @macroguy! Very interesting.

Are you loading just one document into the assistant, or more?
Gaslighting also helped me, but only for a few attached docs (1-3), not the official 20.

And are you using the citations feature? that one was also not working most of the time.

Well done! Thanks for the tips :slight_smile:

Yeah, I’m aware. But what’s the benefit of using the Assistant API over the Chat API for this use case?

The Chat API is tried and tested, the Assistant API isn’t.

The most important benefit is simply the easy of release. You don’t have to worry about context, history of messages, RAG, etc. You simply configure the Assistant and use the API. This is INCREDIBLE to get your product into production and test it to validate any early hypothesis.

I still have not seen any real advantage in the long run. You have more control creating your own RAG system.

Great tips… especially the speed tips. Well done :slight_smile: