Extracting and summarizing text from filtered structured data using OpenAI Assistants + Code Interpreter

Chris60 · March 20, 2024, 1:08pm

I’m developing a chatbot using the OpenAI Assistant (with a pre-loaded excel file and Code Interpreter + Retrieval activated), aimed at distilling key points from book summaries within a dataset. Each entry in this dataset represents a book and includes details like the title, ISBN code, publication date, theme, ranking, summary, and more. The chatbot is designed to generate insights based on filters applied from prior user interactions. For example:

User: How many books about vampires were published in the last 4 years over 3 stars?
Assistant: (uses Code Interpreter and reads the output) There are 20 books with over 3 stars published in the last 4 years.
User: What are the main differences between those books?
Assistant:
Book 1: <title 64> Summary: This book is about a kid
Book 2: <title 73> Summary: This book is about a family

As you might see, the challenge arises when users, after narrowing their search to a specific number of books, seek to understand the main differences or request summaries to decide which book to read. However, the Assistant tends to offer overly brief responses that lack depth, such as a single line like ‘This book is about a kid,’ which isn’t very informative.

Ideally, the Assistant would thoroughly process the full text of each book’s summary (each ‘Summary’ for each observation is a full page in length) and provide a concise yet comprehensive summary, approximately 5-10 lines long. However, when utilizing the Code Interpreter, the Assistant seems to merely glance at the ‘Summary’ feature’s text, presenting only the beginning without fully engaging with the entire summary values.

I’m seeking suggestions on how to enhance the Assistant’s capability to read and summarize these extensive text summaries more effectively. The goal is to ensure it delivers more detailed and insightful summaries to users after they have narrowed their search. Any advice or recommendations would be greatly appreciated!

I’ve been considering an alternative approach:

Create a text file containing the books’ ISBN codes, titles, and summaries. I would then instruct the Assistant to open this text file whenever the user requests summaries. It would filter this text file using the ISBN codes from the previously filtered dataset and then process the summaries. Here’s an example of what the text file might look like:

ISBN: <ISBN code>
Title: <Book title>
Summary: <Summary of 1 page long>
 


ISBN: <ISBN code>
Title: <Book title>
Summary: <Summary of 1 page long>

Does anyone have any other suggestions or another approach that has worked for you?

Topic		Replies	Views
Assistant API: Analyzing with code interpreter a dataframe with long-text features API code-interpreter , assistants-api	2	926	March 22, 2024
How should a program be written to summarize a long text using an API, and what are the considerations regarding the maximum number of tokens allowed? API	2	2462	April 19, 2024
Best approach to Q&A a SQL table (post dev conf)? API	1	857	November 11, 2023
Integrating OpenAI API for Comprehensive Knowledge of My Web App API	2	373	January 23, 2025
Optimizing an Assistant API for Localized Insights from Tabular Data API gpt-4	0	516	May 3, 2024

Extracting and summarizing text from filtered structured data using OpenAI Assistants + Code Interpreter

Related topics