Hi, guys! I need to calculate tokens for the OpenAI file uploading. And I need to know the amount of tokens, so if it is bigger than need I have to reject uploading. Do you know ho to do this?
Welcome to the community!
It’s fairly straight forward: you extract the text and use tiktoken (GitHub - openai/tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI's models.). just make sure you use the correct tokenizer, different models may use different ones!
Thank you! And if the file includes images?
If you’re asking about vector storage, I don’t think images are actually being considered at the moment. And if you are, the pricing is different: you’re being charged by GB, not by tokens:
https://openai.com/api/pricing/
but calculating image cost is a bit more complicated. Here’s a start:
https://platform.openai.com/docs/guides/vision/calculating-costs
(note though that gpt-4o-mini has a significantly different token per tile cost)
you can also play with the image cost calculator here: https://openai.com/api/pricing/
Thank you! Maybe you coudl help me with my other post?
/t/image-extraction-using-the-openai/894415/2
This is only half true, Diet. You’re right, for storage, Vector Storage and File Storage are being charged by the GB.
But, @vladymyr.r, when you query an AI using a file in Vector Storage, you are still getting charged by the token for the text search (or whatever) you’re performing. You may still want to reject large files if you don’t want the Model searching a ton through a ton of tokens when it gets queried because that can still balloon costs.