I’ve been building a product assistant for use on a large, global eCommerce web site (selling complex engineering products). The experience has largely been good, but not completely production ready (hence the Assistant API being in Beta I guess!)
My data source was about 6,000 PDF files of product data totalling around 6Gb. The bulk of my work involved in creating the assistant was sorting out this data!
My first discovery was that the assistant API doesn’t really seem to like zip files! I’m not sure why, but I have never managed to create an assistant using zip files.
So, the next logical step was to concatenate the PDF files so that I had less than 20 files, each of which would be below 512Mb. I had to write a program to do this. It was then that I discovered the additional 2,000,000 tokens per file limit! D’oh! After much experimentation, I found that limiting each concatenated PDF file to 100Mb seemed to hit the sweet spot. This meant I could only include about 2Gb of the 6Gb I wanted, but it’s good enough for a prototype.
With all this done (it took about a day and a half to get it working), my assistant starting responding reasonably well to queries. It sometimes gets things wrong. When I check the source PDFs, it seems to sometimes misread data in tables (by concatenating data from adjacent columns), but it’s not too bad.
My next surprise was learning about context tokens and token limits! I still can’t quite work out what context tokens are, but I am consuming them by the bucket load! I’m not sure if they relate to the message context or they relate to the files that OpenAi is reading from when answering queries. It’s a bit of a mystery!
I have the Assistant API to be a bit slow, which is made worse by a lack of streaming. It seems to take 30 seconds (ish) to get a reply, although sometimes this runs in to several minutes.
As my OpenAI account is less that a week old, my rate limit is tiny (500,000 tokens per day). Whatever is eating the context tokens is getting through that in less than 100 conversations.
I also learned that there is no way to get meta-information about token usage via the Assistant API. The first I knew of the token limit was when I started getting messages saying I’d hit my daily limit, try again in three minutes!
Hopefully, in a couple of days, I’ll get uplifted to the next tier and get 1.5 million tokens per day, but that still seems like a small amount given the context token consumption of the Assistant API.
I still need to do a lot more work on data preparation for the assistant. I can prune the PDFs down and separate them by language (e.g. have an English assistant, a German assistant etc). This will help me work with the PDF volume a bit better.
So, in general, I think the Assistant API is good, and will be great when the speed improves, token limits increase, visibility of token consumption improves and the linked file limits increase. It’s probably good enough for now for me to get a limited beta on to the web site (along with great big caveats about accuracy and speed).