How to read large files using OPENAI API?

hunterr · September 12, 2023, 2:26pm

I am building a chatbot using OpenAI API Key. I have used gpt-3.5-turbo-16k engine there. I want to read large files such as, pdf, word, excel, csv, text files. I tried uploading a large pdf file of nearly 8 MB, it could not take. Then I compressed that file and again uploaded, it was able to be uploaded but when I start asking questions to it, it could not able to respond due to exceed of token limit. How to deal with the same?

_j · September 12, 2023, 2:28pm

You need to read about the AI models’ context window length limitation.

This understanding is fundamental to programming an AI and understanding why you can’t do what you describe.

hunterr · September 12, 2023, 2:35pm

so you are saying one can no way able to read large files through a chatbot?

sergeliatko · September 12, 2023, 6:51pm

Hey, unless it’s you who built the bot and used a database to retrieve shorter pieces of context from documents to answer user requests…

jahzwolf1955 · September 12, 2023, 7:19pm

You have no choice but chunk your large document. You can give it about a page that is it. A lot of ideas how to do this

You can talk to the big vector analysis guys
Or find a way to word key map the section of the document being asked about

Until OpenAI starts doing this themselves it is on us to do it

I have been happy using word mapping for my use but it doesn’t lend it self to gigantic documents

I suspect the vector analysis chunking works but i wasn’t impressed with using it

I still think key-word mapping is the key I just am not sure how yet

hunterr · September 13, 2023, 8:57am

any idea how to do it like dividing large files into multiple small chunks and apply vector embeddings to retrieve only relevant information?

jahzwolf1955 · October 6, 2023, 2:43am

Chunks you need to split it up somehow. There are a few options you can do a vector analysis and find the relevant text that way

Or you can keyword map them

I suspect there are others

But most just cut it down and stay under the 8192 token limit or pay extra for the 16K engine

Still that is not enough for larger documents

Chunking is your answer this five minutes they are expanding the input every day and is suspect theirs on the chat is 32K

Topic		Replies	Views
Answering questions about text file content API	5	8987	December 15, 2023
Seeking Advice: Uploading Large PDFs for Analysis with GPT-3 API API gpt-35-turbo , chatgpt , fine-tuning , api	7	7051	December 13, 2023
Chatbot with user provided files: how to let GPT have a "overall" view of the file content? API	3	1514	December 16, 2023
File_search with max num results API	4	57	April 30, 2025
Making a chatbot that answers questions from a book API api	3	4966	December 15, 2023

How to read large files using OPENAI API?

Related topics