Assistant API: Analyzing with code interpreter a dataframe with long-text features

Hello! I’m currently analyzing a dataframe that I’ve uploaded to the Assistant, engaging in a conversational analysis. However, I encounter a problem when I request summaries of features that contain long text.

For instance, the dataset pertains to books, with each entry representing a book and including details such as the title, ISBN code, publication date, theme, ranking, summary, and more. After engaging in a brief conversation, the Assistant has successfully filtered the dataset to include only books themed around vampires, with rankings over 4 stars, and published within the last two years, narrowing it down to two books.

When the user requests summaries of these two books, the Assistant fails to provide the summaries from the dataset for both books. I am uncertain if this issue arises because the summaries are too lengthy, or if there is a processing error within the Code Interpreter instance. The summaries exist within the dataset, and I would like the Assistant to present these two summaries to the user. I have encountered several issues when attempting this:

  • The Assistant provides a one-line summary for each book that lacks detail (e.g., “The book is about a kid”), even when the summary from the dataset contains between 500-1000 characters.
  • The Assistant generates inaccurate information when the summary is present in the dataset, a problem known as “hallucination.”

I would appreciate any suggestions on how to address this issue. Ideally, the Assistant should provide the exact summary from the dataset or perhaps a concise version of the summary feature.

2 Likes

It’s probably not the issue, but this thread comes to mind:

what model are you using?

3.5-turbo-1106 (I’m not currently using gpt-4 because of the cost)