Hello. I’m testing the assistant API and noticed that it doesn’t split my files into small chunks. I uploaded two files, one with 26,470 characters (11,767 tokens) and the second with 66,882 characters (29,725 tokens). Since my text isn’t in English, according to the OpenAI tokenizer, one token for me is equivalent to 2.25 words.
I asked a question (input 7 tokens) and received an answer (output 392 tokens). However, in the Usage, I calculated that I used 37,290 Context Tokens and generated 456 tokens.
In 40 minutes of work, I used 170k tokens for $4.7, with 59 API requests. It’s crucial to have the ability to customize chunk_size, chunk_overlap, and match score. Effectively splitting files into smaller chunks, performing semantic search, and sending only necessary chunks to the model would be efficient.
The model used is gpt4-1106-preview, Code interpreter is off, Retriever is on.