Average cost of API usage parsing data from 6000 websites

We want to build out a tool that uses the OpenAI API to search 6000 website domains from a CSV spreadsheet. The goal is to have the tool view each site or site description, and then segment each site under 6 different categories (ex. Infra, Marketplace, Exchange, etc.)

How would I approach estimating the token/API cost of this?

Collect the text and count the tokens.

Got it. Text of each domain, or the description?

Text of whatever you want to have the model processes.

This is just my opinion (not a definitive answer) of how I would do this:

  1. Take about 20 sites that are the most average, in terms of the data you’ll be parsing.
  2. Manually create an example input/output (IF you don’t yet have code written – use a spreadsheet to automate things).
  3. Count the number of words, both the input, which includes: Instructions, Chunk of Domains (or one, if your’e processing one at a time), Description, and exmaple output.

For example:

  • Instructions Examine the domain name and description, return the domain name and one of 6 categories: Infra, Marketplace, Exchange…
  • Data www . TestWebsite . com “This is a test website with a test description”
  • Anticipated Output www . TestWebsite . com Marketplace

Concatenate that all, then count all the spaces to get a rough word-cound, then multiply that by about 1.3 to get a rough idea of token-count for purposes of billing. This is not perfectly accurate, but it’s quick and considering you are just averaging a rough estimate of 20 (from 6000), a more accurate version probably won’t help.

Hope that helps :slight_smile: