Count of input token in playground in Non English language

rsam44073 · April 21, 2024, 12:54pm

Hello every body,
I create a assistant with file attached. I ask short question about 27 Token but I see 8460 Input token in playground. language is none English and I know that its possible that input token in non english language increase but input token increase about x300. I check cost from usage dashboard and cost item confirm increase in input token abnormaly. why this happen?

vb · April 21, 2024, 1:15pm

Hi!
The contents of the file also count as input tokens, assuming you are using the Assistants API.

rsam44073 · April 22, 2024, 8:11am

Thanks for your reply, in this way cost of input grows, Is there any method that dont need to use whole file as input token?

vb · April 22, 2024, 8:43am

Yes, this is a typical challenge that requires you to create a custom retrieval mechanism.
Based on the question, you provide the model with only the most relevant parts of your input file. This is called RAG, and in this context based on embeddings.

On the other hand, when using the Assistants API we get a standard solution with high token usage but it’s also very likely that the LLM can answer the question because all context is provided.

I suggest you read up on this topic and decide if it’s worth it for you and if you have more questions, feel free to ask here in the community.

You will be most interested in question answering:
https://platform.openai.com/docs/guides/embeddings/embeddings

Topic		Replies	Views
Token Count: Playground vs Tokenizer GPT builders token , pricing , assistants , assistants-api , assistants-pricing	10	646	February 3, 2024
Assistant API - way too much "input" tokens used API assistants-api , assistants-pricing	4	476	March 28, 2024
Assistant API token Usage - Token usage more than the whole attached file Plus prompts API assistants-api , assistants-pricing	9	1181	March 20, 2024
How does GPT-3 cost calculation for languages other than English? API	7	3142	February 20, 2023
How Are Tokens Counted? API	4	1213	April 13, 2023

Count of input token in playground in Non English language

Related Topics