Finding Trending News Items in an Archive of Content

ScottFennell · February 6, 2024, 5:59pm

question about the openAI API. I have written a large integration for it that relies totally on the chat completions endpoint. It’s basically a homeless man’s version of chatGPT, but there with you as you are writing a blog post in wordpress.
It works well and has some nice features and has some complexity.

So I have some experience with the api.

What I want to do now is have the api find “gems” from my blog archives: old posts that are suddenly relevant due to recent news trends.

To tell it about recent news trends, I’ll first make an http call to the google trends rss feed and pass the news items as a JSON array to the api.

What I’m less sure about is how to expose, say, my 300 blog posts to the api. it would be absurd to send that whole archive in every api call. surely there’s a way to “store” it somehow in my openAI account?

I’m not sure. Reading the API docs, I don’t see how I’d do this.

I asked the API itself how to do this and it balked:

As of my last update in early 2023, OpenAI's API doesn't directly provide a feature to store or index external data, such as a collection of blog posts, within an user's account for continuous access or querying. Therefore, if you're looking to leverage OpenAI to search or analyze your blog archives based on recent trends or keywords, you would have to implement a workaround or an external solution to manage the search within your blog content effectively.

Here's a potential approach you could consider:

### External Indexing and Search

1. **Index Your Content**: Use a search or database service to index your blog posts for quick searching. Services like Elasticsearch, Algolia, or even simpler database solutions like Firebase or MongoDB could be used depending on your technical preferences and requirements. When setting up your index, include relevant metadata for each blog post such as the date of publication, keywords, and perhaps a summary or the full text depending on what you plan to query against.

2. **Query Based on Trends**: When you identify recent trends from Google Trends or another source, use those keywords or topics to query your blog post index. This search should return a list of blog posts that are potentially relevant to the current trend.

3. **Processing with OpenAI**: For the posts identified in the previous step as potentially relevant, you could then use OpenAI's API to further analyze their content or generate summaries/highlights. This could involve sending portions of the text to OpenAI's API for summarization, sentiment analysis, or even asking the API to rate the relevance of each post to the identified trends. 

4. **Optimize API Usage**: Since sending large volumes of text to OpenAI's API can be costly and inefficient, you'll want to be strategic about what you send. This could involve preprocessing steps to select only the most relevant excerpts from each post or batching requests to analyze multiple posts more efficiently.

### Considerations

- **API Costs and Limits**: Be mindful of OpenAI's usage costs and rate limits. Efficiently processing your data outside of OpenAI before making calls to the API will help manage costs.
  
- **Data Privacy and Security**: When incorporating external services, consider the privacy and security implications of your data. Ensure you're compliant with any relevant regulations or guidelines.

- **Continuous Sync**: If your blog is frequently updated, you'll want to ensure your external index is kept up-to-date. This might involve automated processes to add new posts to your index as they're published.

While this approach requires external tools and some additional setup compared to a native OpenAI storage solution, it offers flexibility and efficiency for processing large volumes of text. Moreover, it leverages OpenAI's capabilities for high-level analysis and insight generation rather than raw data processing, which is a more cost-effective use of the service.

Even if I just try to force the completions API to work with me on this, the results are terrible. Worse than useless. Same with the assistants API, see attached.

Topic		Replies	Views
Newbie definition of terms, API, pricing API	7	1587	March 12, 2024
Integrating OpenAI API for Comprehensive Knowledge of My Web App API	2	202	January 23, 2025
Summarizing or question answering from long Wikipedia articles? API	25	3920	January 4, 2024
Fine tuning the model for our specific use case? API	4	962	December 27, 2023
Crafting a Simple "Zero-Shot Classifier" Using APIs - Seeking Your Insights! API chatgpt , api	15	5169	May 28, 2024

Finding Trending News Items in an Archive of Content

Related topics