How do you forecast what an AI feature will actually cost — before you build it?

When I shipped my first AI feature, I budgeted by gut feel, and the real bill at scale was nothing like my estimate — re-embedding, retries, and a too-expensive default model quietly added up. I’d have made very different architecture choices if I’d seen the numbers up front.

So I’m curious how others here handle it: do you forecast cost before building, or find out after? What surprised you most — embeddings, the vector DB, model choice, or retries? Trying to learn how people approach this earlier in the process.

While building (at drawing board) you have some per-call estimate, then you compose and test. Then staging gives you estimates per run (optimistic). Basically, if subjective complexity factor (0-5) * cost per run * 50 < customer cost per run… is the point where you ask your self 2 questions:

  1. Can you divide the costs at least by 2 (better 4)?
  2. Is it worth bothering?

I just build, deploy, monitor and tune iteratively.

In Production, if you can limit the population that is processed initially you can work out the cost for a smaller set, then optimise and finally broaden the population when you are satisfied with the behaviour and cost on the smaller set.

Personally, I saw cost exploding when the context window was not thought through. Stacking and compressing is not the best approach (at least for what I usually do).

I prefer what I call “composition”, where the context is processed in small tasks separately, then results from there are assembled into “answer context” and finally the model gets you the answer you attach to the conversation.

But it’s me who pulls the context from the conversation, so I have more control there.

But: recent build

Goal: build high quality image alt text for a property image in the vacation rental industry.

Inputs: property data payload, high res image url (public)

Model: 5.4-nano

Approach:

  1. Resize image to 1024 largest side
  2. Extract all visible elements (slightly biased by property description)
  3. Reprioritize image main focus items based on property booking decision triggers (again biased by property description + main selling features, strongly this time)
  4. Suggest candidates for image alt text (biased by guest search intent and keyword volume lookup table (static))
  5. Assemble the final alt text.

With prompts, inputs, outputs we are talking about 35K tokens total per run to get:

  1. 5 paragraphs of raw image description (non public result, not usable outside the workflow)
  2. 3 paragraphs of high quality (commercial) image description (useful as input for room description generation, social media posts generation or image selection)
  3. 5 alt text candidates (useless if not in later fine-tuning)
  4. 1 alt text deliverable: 5-12 words

Basically between cached inputs (static prompt+property description always first) and other API costs we are approx 1 cent per run if you run all images for a single property in parallel (average images per property are 30-60).

So cost to get all image alt texts per property are around 0.5 USD.

Sold at about 2 USD/month/property… wait? Is it X4 only? No, it’s a yearly subscription and properties get shooted around once a year. With some rotation you get somewhere close to X20 between all your costs and the revenue.

For me, the above math is risky… but I have the distribution channels and at least 200 entreprise accounts + all decision making contacts to sell to. So the risk is acceptable.

Hope it exposed some logic I use for costs.

“Limit the population, cost it on a small set, then broaden” is basically forecasting with real data instead of guesses — solid. The only place it bit me was non-linear scaling: the small set looked fine, then a vector-DB tier and context growth changed the unit economics at the top end. Do you sanity-check the scaled-up number, or mostly trust the small-set extrapolation?

This is exactly the math I wish more people ran before building — the complexity-factor × cost × 50 check and the X20 sanity test are sharp. The thing your per-run breakdown doesn’t touch, though, is the cost that isn’t per-run: re-embedding when your data or model changes, retries on the hard cases, a vector DB that jumps a pricing tier past some row count. Those never show up in a per-call estimate but they’re what blew my forecast. Do you fold those recurring/one-time costs into your model, or just eat them as they come?

If looking into the example I have, those go into the diff between the “ideal” x48 and “conservative” x20, leaving some margin to be X4 on “pessimistic”.

That’s basically the reason of choosing x complexity factor x50 (ideally x100 if possible) in the formula.

As for retries, embeddings etc., those need to be though through: not all needs retry, some inputs may need to be deterministic (store outputs on input hash key), shake your vector db (weaviate vs the " best" craps), cache embeddings on input hash, and some other technics.

Makes sense — folding it into the x20/x48 spread is cleaner than itemizing every recurring cost, as long as the margin’s fat enough to absorb a surprise. The deterministic-output-on-input-hash point is the one I see people miss most: half the “retry cost” problem disappears the moment you realize a chunk of your inputs repeat and never needed a fresh call at all. Same with caching embeddings on a hash key — cheap to do, and it quietly kills the re-embedding bill.

The one that still gets me is the vector DB. Weaviate vs the “best” crap aside, the trap is the tier jump — everything’s fine until some row count or QPS threshold flips you into a pricing bracket you didn’t model, and suddenly your x20 margin is x6. Do you pin a target DB + tier early and design backward from its ceiling, or stay portable and pick once the access patterns are real?

Hi @shokhrukhkarimov!

This is the official OpenAI Community Forum, and it is not intended for customer acquisition through direct messages.

You can add a link to your profile, but please do not post advertisements or try to move other community members away from the platform.

Thank you for understanding.

Weaviate pricing to check: Vector Database Pricing | Weaviate

Powering recommendation engine per user history on Beat of Hawaii (the recommended posts https://beatofhawaii.com ) - 5k posts, embedding 1024 dimensions, around 2M unique visitors per month… Originally we were around 75 USD/month, but recently they updated pricing and offer, so we fell to 45 USD/month. 3 years+ of operations, no issues so far.

I’m not so sophisticated. I eventually just follow the final number and intervene as necessary.