Limit length in document for search

hi, i was trying semantic search using uploaded file and got error that one of the document has more token than it can handle for search. i think it was about 2k tokens ish limit. (cant remember exact number)

but same file with document works when i used it on answers purpose. For now i am just going to load my data in answers purpose as it also effectively does semantic search.

is there a reason for that limit? is there any way to increase the limit? i am hoping to build chatbot type of experience with the semantic search but my source data document is bigger than the limit. i will try answer route and see if that works as truncating it in smaller sections seem painful and bad design work from my side.

anyone else has run into this or i am doing something wrong?

Manish

1 Like

@m-a.schenk if you want specific examples of a file i can give you that but i know for sure now that same file worked for purpose=‘answers’ and it did not work for purpose=‘search’. it failed during search time not during ingestion and processing time on openai side.

1 Like

yes. same to same. :slight_smile:

The answers endpoint does some chunking, specifically on newline. You can probably most easily see what its doing if you pass in return_prompt = True in your requset

Update: Ah, that wasn’t quite the right answer. When you upload a prompt under the “answers” purpose, the api assumes you’re trying to upload a document for the answers endpoint and chops up your document on newlines in order to increase the chance that relevant parts of the document will fit in the context. That’s probably why you’re seeing your search documents get all chopped up.

1 Like