Results Of Using The Assistant API

I’ve been building a product assistant for use on a large, global eCommerce web site (selling complex engineering products). The experience has largely been good, but not completely production ready (hence the Assistant API being in Beta I guess!)

My data source was about 6,000 PDF files of product data totalling around 6Gb. The bulk of my work involved in creating the assistant was sorting out this data!

My first discovery was that the assistant API doesn’t really seem to like zip files! I’m not sure why, but I have never managed to create an assistant using zip files.

So, the next logical step was to concatenate the PDF files so that I had less than 20 files, each of which would be below 512Mb. I had to write a program to do this. It was then that I discovered the additional 2,000,000 tokens per file limit! D’oh! After much experimentation, I found that limiting each concatenated PDF file to 100Mb seemed to hit the sweet spot. This meant I could only include about 2Gb of the 6Gb I wanted, but it’s good enough for a prototype.

With all this done (it took about a day and a half to get it working), my assistant starting responding reasonably well to queries. It sometimes gets things wrong. When I check the source PDFs, it seems to sometimes misread data in tables (by concatenating data from adjacent columns), but it’s not too bad.

My next surprise was learning about context tokens and token limits! I still can’t quite work out what context tokens are, but I am consuming them by the bucket load! I’m not sure if they relate to the message context or they relate to the files that OpenAi is reading from when answering queries. It’s a bit of a mystery!

I have the Assistant API to be a bit slow, which is made worse by a lack of streaming. It seems to take 30 seconds (ish) to get a reply, although sometimes this runs in to several minutes.

As my OpenAI account is less that a week old, my rate limit is tiny (500,000 tokens per day). Whatever is eating the context tokens is getting through that in less than 100 conversations.

I also learned that there is no way to get meta-information about token usage via the Assistant API. The first I knew of the token limit was when I started getting messages saying I’d hit my daily limit, try again in three minutes!

Hopefully, in a couple of days, I’ll get uplifted to the next tier and get 1.5 million tokens per day, but that still seems like a small amount given the context token consumption of the Assistant API.

I still need to do a lot more work on data preparation for the assistant. I can prune the PDFs down and separate them by language (e.g. have an English assistant, a German assistant etc). This will help me work with the PDF volume a bit better.

So, in general, I think the Assistant API is good, and will be great when the speed improves, token limits increase, visibility of token consumption improves and the linked file limits increase. It’s probably good enough for now for me to get a limited beta on to the web site (along with great big caveats about accuracy and speed).


Looking forward to your updates with assistant API.

I too am currently building a prototype for my travel venture

Noticed, tables in .pdf format, csv or xlsx does cause erroneous info reads and more often info skips.

Very interesting case @nick.mckenna!

I have the Assistant API to be a bit slow

Which model did you use? Have you compared the quality of responses between them? GPT4 models are very slow with assistants and it’s actually hard to get a good chatting experience, but GPT-3.5 models are much faster and perform surprisingly well with assistants with large amounts of files.

1 Like

Our next step is to get the assistant ready for a limited beta rollout. It’s a product adviser in a very technical area for global eCommerce. We’re going to take it slowly and carefully. We’re concerned about the quality of the answers that the Assistant sometimes gives. For example, we’ve seen the same table misreads that @engagespy mentions. We’ll probably start with logged in users in the UK to limit the exposure.

I haven’t tried the 3.5 model that @konradk suggests - I didn’t think it worked with Assistants and Retrieval. I will give it a go!

Most of my work this work is going to be on further refining the data prep step. I need to reduce the number of PDF that I incorporate into the Assistant. There are a few obvious steps to take here, so it would be too hard - it’s just a bit of grunt work that I can automate!

I’ll keep sharing updates as we make progress!

1 Like

Afaik Assistants only works with GPT4 unfortunately. You could try using Chat Completion API (I think that’s what it is called) with GPT3.5 but it won’t keep the context of the conversation in memory like Assistants can, you can make your own custom context/memory system though with a vector database etc.

Although Assistants is slow, I prefer it to 3.5 because 3.5 soon forgets what the conversation is about. I’m creating an AI companion app for virtual reality headsets and Assistants is really good imo, ok the speed to reply takes a few seconds but it’s the cost that is the biggest obstacle.

From everything I have read so far (no personal experience as of yet), Assistant API is simply not designed to handle a dataset this large in neither an efficient nor cost-effective manner. As you quickly discovered:

So, 1/3rd of the total files you really want to use?

I’ve heard this a lot also. Imagine what the cost will be when you add the additional 2/3rds of files.

If you’re happy with the overall performance, then that’s all that matters.

But I really think you may want to consider the RAG approach using the Chat Completion API. It will be way more scalable, far less expensive (to operate on an ongoing basis), and give you at a minimum the same results you are getting now.

I mean, it’s certainly none of my business, but I’m just amazed at how many hoops people are willing to try and squeeze through to get the Assistants API to do what it was not designed to do. 20 file limit should have been the first clue. Just something to consider.


Hi @nick.mckenna

I can help you. It may seem obvious but here is a tip.

  • Threat it like a human, not like a machine.

I’m using this approach since a while, I have build many assistants for the past 6-7 months using the API. The outcome is different, very different. I’ll be glad to share and help if you need.

Good luck


Assistants work fine with 3.5 and you can know utilize uploaded files with 3.5 as well.

1 Like

In my case the assistant is working only with near 100 pages document with product specifications. It contains somewhere near 20 simple tables (~7 columns and 4-10 rows) and I can not make assistant to retrieve even these data correctly. Firstly it started to round up. I thought that this is because some columns with data has >= parameter (like in the screenshot). After I removed it and fine-tune assistant for days, but it STILL mixing up this data. Have you checked the data especially when it comes to some exact concrete figures? How did you managed to train model to give you exact data from tables?

I highly recommend implementing your own RAG system.

The current Retrieval system offered by OpenAI is

  • Black-boxed (You are paying for storage but have no access to it or the metrics generated)
  • Expensive (Both in the token usage and storage)
  • Unfinished
  • Without roadmaps or updates since announcement 3 months ago.
  • Without keyword search

It’s not terrible. It’s a good start for prototyping and for basic document retrieval. In your case and any production-grade material should use it’s own RAG system considering how simple they are to implement.

In your case with tables you are not using the right equipment for the job. You would want to focus on keywords more than semantics.

I would recommend processing your tables into a more database-friendly format and then use a Knowledge Graph such as Weaviate to embed it all.

For those who decide to use Retrieval. Stop. Using. PDFs.

As a reference for costs: The current Assistants Retrieval is $0.20/GB/Assistant & Thread/Day. They have been silently making it more expensive. So if you have 10 threads with 1 GB file per thread (same Assistant) you are paying $2/Day MINIMUM This does not include any files attached to the Assistant(s) or the tokens used to parse/validate the documents

Pinecone. A vector database that now offers serverless functionality costs $0.33/GB/MONTH


“My use case is I have 73k pdf files and my GPT can’t seem to process them all and give me the right answers? I think my account might be bugged? This is a bad product.”
-Some people

I think @mouimet hit the nail on the head, if a human couldn’t do it well the model probably can’t either.

OpenAI has made some amazing products for coders and non-coders alike, but they seem to struggle with helping people understand how to use them.

1 Like

When I first started using their products they had an incredible Cookbook that really helped me set my foundation. Now they just throw these powerful tools and go “idk, figure it out, go crazy”.

There’s no documentation or insights into how Retrieval actually works. It’s insane.

Followed by “Wtf why do I owe OpenAI >$100” :smiling_face_with_tear:


1 Like

I’m sorry, I was not clear enough. Of course it can do it. The solution is to use ChatGPT as it is a human teammate, not a machine. The key is there! If you having hard time find the solution, let me know and I will help you understand.


1 Like

I am not from IT field, so I was interested particularly in “almost-ready-to-use” solution with no or minimum coding needed. This approach is more sophisticated and I need to explore other fields. Anyway - big thanks to you, I will consider your advises.

1 Like

Another option is to implement RAG yourself using ChatGPT embeddings and api. This way there is no limit to either the number of pdf or the size of content. Here is a sample open-source project to start with GitHub - Anil-matcha/ChatPDF: Chat with any PDF. Easily upload the PDF documents you'd like to chat with. Instant answers. Ask questions, extract information, and summarize documents with AI. Sources included.

Thanks for correcting me on that, I will definitely try this with 3.5.

When I’ve worked with PDFs an PPTs, I’ve noticed that tables are usually difficult. You can try it yourself by copying the table and pasting in notepad. It usually ends up as an unstructured mess. They need to be cleaned up for AI consumption.

As far as I understood RAG, there no guarantee at all that in case of even a simple table (let’s say price list with 3 columns) system will not return exact figures on request. Same if you download json (jsonl) or csv files.
Does anybody know how to make Assistant or other system to make this happen?

A table of organised information can be processed by an LLM as part of the prompts context, but numeric data does not suit semantics very well, the much more simple model used by embeddings(part of RAG) is typically unable to determine the contextual relevance of 1 and 2 in a table when searching for 1.5.

Essentially, if you have a table you wish to use as input, you should include it in the prompt directly rather than a rag retrieval, that is unless you have metadata with the table embedding that can be searched for with semantic similarity, but it would require experimentation.

1 Like

Thank you. Do you mean by “include in prompt directly” to literally download file? If we take the Assistant, for example, like on the screenshot?