I’m working on an AI app that automatically updates blog posts. It first uses the o1-preview model to generate a list of suggestions. Then, when a user approves the changes, the app uses the same or simpler models to implement them.
The quality of both the suggestions and the changes is very good, and I’m quite satisfied with the results. However, sometimes the information needs to be researched before it’s provided to the user. Currently, I rely on the Perplexity API to find information online, but I’ve noticed that its quality is much lower compared to Perplexity Pro results.
So, I’m wondering if it’s possible to use the ChatGPT model with a web search feature via API? I’m the OpenAI tier 5 user.
No. But you can write a search function yourself.
Check my github project Davidasx/flask-gpt which implemented that.
So question for the broader community. OpenAI is unlikely to expose a search api and even if they did it would be RAG based which largely doesn’t work very well when reasoning over a large set of web pages.
My company has developed a technology called the Elastic Context Window which lets us stretch the context window of most models to be any length. We have a SearchGPT like feature that uses Bing Search to fetch web pages that we can reason over. If we fetch 20 web pages that total 2 million tokens we reason over all 2 million tokens unlike SearchGPT. Yes that means you have to pay for 2 million tokens but were able to blend gpt-4o-mini plus either gpt-4o or o1-preview to reduce the costs. For gpt-4o we can get the cost down to $1 per million tokens flat (in or out) and for o1-preview we can do $5 per million tokens flat (in or out). Caveat being we have to charge for at least 128k tokens per request to avoid abuse.
The question is… if we surfaced this capability as an API would people in the community be interested? We add the same basic thinking tokens that OpenAI adds which amount to around 5% - 20% tokens per request but we do so at a significantly reduced cost and we will reason over every single token unlike SearchGPT.
The other thing to note is that we’re seeing the average token count for 20 pages is around 150k - 200k tokens not 2 million. In fact a lot of requests can be answered with 5-10 pages and under 100k tokens but we need to charge a 128k token minimum to deliver the cost savings.
1 Like
How does the quality compare to Perplexity PRO?
I haven’t really tested perplexity pro but our answers are generally as good or better than the base model we’re using. So if you’re using o1-preview they’re as good as o1-preview but over a much larger context window (up to several million tokens) they’re also faster then o1-preview would be at that token length.
The cost savings and speed increases are because we route the bulk of the tokens for the query through gpt-4o- mini. You can think of it as we use gpt-4o-mini to scan the context window looking for potentially relevant content. We gather up all that relevant content and then use o1-preview to reason over all of the content we’ve gathered. The actual algorithm isn’t exactly that simple but that’s the net effect.
I say our answers are potentially better than the base model because we don’t generally suffer from “lost in the middle” issues that plague long context reasoning? Why? Because our algorithm essentially deletes the irrelevant tokens setting between the relevant ones. Lost in the middle is a distance issue and we move everything closer together distance wise which results in generally more accurate answers.
I haven’t really tested against perplexity pro too much because I have our search engine but I have tested it a fair anmount against ChatGPT with search and we regularly give significantly better answers then ChatGPT.
Also a note on size… the largest query we’ve done is 104 million tokens where we successfully retrieved 20 hidden passwords from a fibrous of 70 million words. It took a little over 4 hours to run and required 28,000 model calls but we did that against a fine tuned version of Llama 3.1 70b running on a server with 2 RTX-4090s to save money and not get rate limited by OpenAI.
In practice though I’ve seen that most real world queries break down if the corpus gets above 3 - 5 million tokens. It’s difficult to describe what happens but it’s like the more information the model reads the more it wants to summarize its answer. For example you could give our engine an entire website and ask it to summarize the contents. Our algorithm will read every page and generate a nice page by page summary in its thinking tokens. When we go to perform final reasoning over that the model will see all of that and basically say “yeah that’s a lot of information” so it’s answer will be “there’s a product page that highlights all of the companies products and support page with a number for contacting the company”
I’m starting to make progress on stretching those types of answers out but it’s anything but easy. Here’s an example of a recent test case where I used our engine to automatically generate the documentation for a medium size code base with ~280,000 tokens. The final output was 9 documents totaling 45,000 tokens or about 179 pages of text. It’s far from perfect and longer then what it should be but I wanted to see just how much I could get our engine to write…
On the query side of things here’s a good example of using our engine to query over the pandas user guide which is about 680k tokens in length. That was done back in july with gpt-4o (I should updated it with o1-preview) but it would make a good comparison for perplexity pro.