I do research for a medical center project. We are developing ideas for the healthcare market and made an interesting chatbot with GPT-3 5-Turbo. So when I had the chance to try the new GPT-4 API, I directly did. The way GPT-4 handles the system details is way better. It really sticks to the role given. Absolutely amazing results. Unfortunately, I realized how much more expensive the GPT-4 API is. Maybe I did something wrong, but tokens added up quickly. So beware when testing your tools on scale. Other than that, it’s the best thing since sliced bread… Cheers!
Hah. Yeah, I noticed last night too with several hours on GPT-4…
I’m sure the price will come down eventually… as the earlier models have… the ChatGPT-turbo costs kinda lets us know as devs that’s what they want us mostly using I think haha…
You probably didn’t do anything wrong, but you may not have done enough to shape your bot to optimize for costs.
In the future – actually now – there will be consultants who specialize in designing systems that use financial boundaries to shape solution requirements and architecture. I’ve been approached by three clients seeking exactly this sort of assistance. My accounting and finance background may actually come in handy after all these years of trying to forget that crap.
We tend to assume that prompts and parameters constitute the limit for controlling costs with GPT. But they’re just the direct elements in the cost envelope. There are many ways to lower AI costs while increasing performance. GPTIndex, embeddings, and fine-tuned models constitute the short list of possibilities.
Yea, I can’t believe there aren’t more people talking about this.
If you do the math, gpt-4 is a 2900% increase over gpt-3.5-turbo. That’s NUTS!
Where can we formally petition the powers that be to bring those costs down asap? Those of us who just made adjustments to product prices have to make them all over again, and it’s really going to hurt our pricing models.
Yep, insane. Unfortunately, for now, it’s unusable for my use case.
But the potential is awesome.
Reading through the thread, I thought “tokens adding up quickly” was due to GPT-4 encodes sentences in more tokens. However, after looking at my experiment results, I realized that I was totally wrong. It is probably just because GPT-4 generates a longer reply.
From my experiment, the number of prompt tokens does not increase when switching from GPT-3.5 to GPT-4, but the response does gets longer if the prompt does not limit the response length. In this case, adding a sentence to ask the AI to be more concise, or even provide a word count limit, should be able to make a difference.
Details of my experiments:
- Prompt: Tell AI to behave as a media literacy assistant. Then, the user provides a message circulating in closed messaging apps. Lastly, the user ask the AI where should they be aware of as an audience. There is no sentence telling AI to be concise. All prompts and responses are in Taiwanese Mandarin.
- Setup: Uses the same set of prompts input on
gpt-4. Each model generates 3 responses (
n = 3in API request).
- Test result: Cofacts prompt engineering on ChatGPT - Google Drive
- I did not store the number of response tokens. But it is obvious that GPT4
- Note that the number of prompt tokens does not differ much. Therefore, I think the token increase is solely from the increase of response length.
One idea I’ve found kinda works, sometimes, is to try GPT-4 to see what is possible, then with that goal in mind enter ‘prompt-engineering hell’ to see how you can coax 3.5 into a close approximation.
All the usual stuff: COT/React, breaking up into smaller steps (if you are using the API) invisible to the end user, even diving into MS guidance or other tools that try to get between tokenization and inference.
This issue is highly significant and distressing. GPT-4 is quite expensive, and as developers, we find ourselves in a challenging situation. Particularly in countries where the exchange rate of the dollar varies significantly, such as Turkey, where I reside.
When I used GPT-4, my billing spiked and multiplied. So I dropped it in favor of GPT-3.5.