Hello, and thank you all for providing support. I am resigned to getting zero human contact from OpenAI. Bot responses for 2 requests said sign up for Team ($600).
I am an old lamp stack guy, experienced with APIs, new to AI. I want to populate a website with world news, country-by-country, language-by-language, daily if possible. Let’s say this might require 200-400 automated queries per day.
I am signed up for Plus for $20 per month and can generate news results with ChatGPT 4, unlike 3.5.
At one place on the site it seemed to say API was for Enterprise only, however I have obtained an API key and I see that API is available to anybody.
At one location on the site, it listed pricing per GB. On another, it says per prompt token or per sampled token. The text of one result was about 2kb. Firefox page source was about 25kb. I have no idea what json results would be, likely somewhere in between. Worst case, rough guess, 400x25 would be about 10Mb in downloaded text.
I have no free API credits, I had signed up with a different email, didn’t use it, but only have one phone. But $5 or $18 is no big deal to try it out.
I want to get a ballpark idea: 1) Can I do this on the Plus plan? 2) What would be a cost range for a working commercial application? 3) What are the definitions of prompt token and sampled token, and is that the current correct way to measure cost?
Thanks in advance
Mike
1 Like
Hi @mwpclark - welcome to the Community!
I’ll try to get started with few points.
As an initial question: For which specific parts of the process, are you looking to use the API for? Note that the API is entirely different from the ChatGPT browser interface. So depending on what you specifically have in mind, the two products might serve different purposes.
For API-related pricing, please have a look here.
As pricing is based on tokens, you’d need to determine the average number of input (prompt) and output (completion) tokens to determine the potential costs.
Just to get an initial sense, you can use the tokenizer here: https://platform.openai.com/tokenizer
For ChatGPT Plus, your monthly cost is fixed at USD 20.
2 Likes
Thanks JR, quick test showed roughly 150-300 tokens to “today’s news from …”, mostly response tokens
Roughly double for non-Roman language translation.
“today’s headlines no opinion no conclusion” shortened it a lot including non-Roman language!
1 Like
One thing to note is that you’re not going to get current news headlines from the API. That feature is specific to ChatGPT and relies on an integration with Bing. It’s not exposed via the API
1 Like
Interesting, stevenic, thanks. That’s probably by design, gotta view those bing ads. I’m guessing there is no workaround, my best bet may be to do the grunt work and keep fixing and replacing my current RSS feeds as they go bad.
1 Like
Of course there is. For use self hosted open source model. But it is way more expensive.
Hi again @mwpclark - early last year I’ve implemented an end-to-end automated platform that sources news/updates from organizations across the world within a particular domain and publishes their summaries on a website. Here’s how my process flow for this looks like and at which points I use the OpenAI API:
-
Sourcing of updates: Relevant websites are scraped, whereby only new entries are extracted (I personally don’t rely on RSS feeds but you could technically use the same logic for RSS feeds). Where possible I typically extract it in the original language to avoid information loss.
-
Pre-processing: If necessary the tool performs additional pre-processing such as some basic text cleaning and, if applicable, translation to English. In my case, I use a translation API for this step.
-
Transformation: The OpenAI API is used to generate summaries of the news in a consistent style and format. Additionally, it relies on fine-tuned OpenAI models to classify the news by multiple different dimensions.
-
Storage: Both the raw extracted news and the output of the transformation steps and other contextual data (e.g. name of organization, region, country, date of publication) are stored in a relational database.
-
Publication on website: The website fetches the latest data from the database via a custom-built API every few minutes. The front-end is designed to allow for custom filtering.
All these five steps happen fully automatically on a 24/7 basis.
Perhaps this gives you an additional idea on how to approach the problem.
Thanks @jr, interesting stuff. I wrote this news site in 2016. It is primarily in php, but uses perl to run wget and grab rss news feeds and store them in mariadb. The system runs nightly on a cron, untouched by human hands. It has made it through a couple of server moves. Viewers can select country or region.
There were almost 500 feeds to start with, but maintenance is an issue. Feeds change, break and disappear. I am continuing with a new search for working news feeds, as well as looking for methods such as AI to at least let me know when problems arise. All with minimum investment, of course.
Another issue is that some publications will have an rss feed, but require that visitors create an account, often a paid account, in order to read the full article. I see this end result with aggregators such as google news and yahoo news all the time.
1 Like