Asking for a summary of news article titles and Chat Completion Is Not Able to Summarise

By using OpenAI Chat Completion Python API, I am trying to create a summary of article titles for a newsfeed.
Let say I have 100 ‘Basketball’ news and I want to generate a summary by using these news article titles.
My prompt is like this:

<Act as a journalist and summarise the most important ‘Sports’ news related to ‘Basketball’, in 150 to 200 words based on the article titles below delimited by triple quotes.
“”"

  • some basketball news
  • some transfer
  • a new saloon is built
  • so on…
    “”">

However, it is not able to summarise sometimes, although there are quite a lot of information related to ‘Basketball’ and it answers back as ‘There is no most important Sports news related to Basketball in the provided articles’

How can I make this work?

Hi, can you provide some examples of the code used , the prompts sent and the replies generated?

I see your problem quickly:

From “just headline”:
write 150 word summary.

Where is it supposed to get that summary? You want it to imagine? Hallucinate?

Then you have ambiguous classification phrasing.

I can just ask ChatGPT (using copypasta input that ChatGPT has sanitized already for me), and here it doesn’t go into the realm of pure fiction to create the summary.

It also would need more information for many cases of “based on the article titles”: it might not know what sport a drafted player represents (but got an ambiguous one of mine correct)

The code is something like this (might not be syntactically correct). Despite, I am sharing the prompt without changing:

def generate(self, prompt: str, news: list) -> str:
    prompt = f'Summarise in 150 to 200 words the most important "Industry" news related to topic
"Electronical Fraud", based on articles below delimited by triple quotes.
Do not generate a summary in list format. Create paragraphs instead.
Only provide the generated summary as the response, do not start your responses with an introductory sentence.
If there is no important "Industry" news related to the topic "Electronical Fraud" in the given articles,
please provide only the word "EMPTY" as your response.
\"\"\"
- Obasanjo,IBB Among 25 World’s Richest Politicians
- The Latest: Trump expected to shift toward economy
- Waking Up In Hillary Clinton’s America
- Before Trump, Sen Bulworth Spoke Truth To Power
- Fox stars privately bashed election fraud claims the network pushed
- Rupert Murdoch says some Fox News anchors ‘endorsed’ false election fraud claims in Dominion case
- Republicans Exit ERIC, Citing Voter Fraud Potential
\"\"\"
'
    try:
        messages = [{'role': 'system', 'content': 'Act like a journalist and generate a complete answer in 150 to 200 words.'},
                    {'role': 'user', 'content': prompt}]
        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=messages,
            temperature=0.8,
            max_tokens=300)

        result = completion.choices[0].message.content
        return '' if result == 'EMPTY' else result
    except openai.OpenAIError as e:
        logger.error(f'Error occurred while initializing OpenAI API: {e}')
        return None

This returns ‘EMPTY’.

The content of the reply can be extracted with response[‘choices’][0][‘message’][‘content’].

reply = response[‘choices’][0][‘message’]
print(f"Extracted reply: \n{reply}")

reply_content = response[‘choices’][0][‘message’][‘content’]
print(f"Extracted content: \n{reply_content}")

Source: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_stream_completions.ipynb

For non-streamed responses, you can also get response_tokens and see it isn’t 0. Verify “finish_reason”: “stop” as validators of expected output.

I didn’t understand the relation between my question and the answer. I am already using

result = completion.choices[0].message.content

in my code.

And about the streaming part… I don’t have a time constraint but token constraint, and streaming doesn’t help me with the response token I see.

But, about this, you are right indeed. I am aware I am asking to create a summary from titles, which doesn’t carry enough information.

However, it is impossible to predict if I will have a response or not. For the same article titles, sometimes it returns a very valid summary, sometimes I get nothing.

The unreliable response is likely your mishmash of instructions between system and user roles. The needless triple-quotes that don’t delineate the actual headlines as you say they should.

I would articulate clearly in the system role that it is going to operate on user-provided data in the particular way, and only provide the data in the user message.

Chat models should be reliable when the message containers are filled. I just had a babbage case where it returns nothing for a particular input without putting some more carriage returns after the single unlabeled input for completion (and then acted more like an insertions model).

I have used the triple quotes by the suggestion of the short course of prompting by Andrew NG. You can check here, almost all notebooks are using this approach: GitHub - ralphcajipe/chatgpt-prompt-engineering: Jupyter code notebooks of "ChatGPT Prompt Engineering for Developers" by DeepLearning.AI and OpenAI.

About the system and user messages, do you suggest something like this?:

System: You will be given some news article titles. Act like a journalist and summarise them in 150 to 200 words. Do not generate a summary in list format. Create paragraphs instead.
Only provide the generated summary as the response, do not start your responses with an introductory sentence. If you cannot generate a summary, output the word ‘EMPTY’ as your response.

*User: *
- title 1
- title 2
- so on…

About the EMPTY… I have asked this question before and when you instruct with respond with empty string, do not respond with anything etc… it just doesn’t follow. This suggestion worked pretty well so far.

Also, about the system role, if you have checked/followed other discussions in the platform, it is ‘weaker’ than user input, and currently do not behave as intended.

See how delimiters are used and considered ‘best practice’ in the very own guide of openAI OpenAI Platform

That’s the idea. Like code, we have instruction and then we have data.

The performance of the AI can be improved more by telling it exactly what it is actually doing instead of “act like a journalist”. Without fully understanding where you are using this programmatically, we can assume, and write an example system instruction better:

You are the AI processor for my website, which is a news aggregator and only shows basketball stories extracted from a general news feed data source. Additionally, while the news feed only gives headlines written in a terse style, we want to provide the reader with an expanded summary that uses natural language. For each headline that is provided in a list of headlines in the user role message, you will only act on the headline if the language strongly indicates the category is basketball. Then, for each meeting this basketball topic criteria, you will then synthesize and output a paragraph-length summary of the news story, with each story separated by carriage returns. If no story meets the criteria, you will indicate this to backend code by only outputting the text “no basketball news”.

(other code output specifications, other instructions about length and how many stories to consider…)

This is quite a detailed prompt, I can see and have some inspiration, thanks.

just as a thing to try not sure about your single ’ f line, try this

prompt = f'''Summarise in 150 to 200 words the most important "Industry" news related to topic
"Electronical Fraud", based on articles below delimited by triple quotes.
Do not generate a summary in list format. Create paragraphs instead.
Only provide the generated summary as the response, do not start your responses with an introductory sentence.
If there is no important "Industry" news related to the topic "Electronical Fraud" in the given articles,
please provide only the word "EMPTY" as your response.
\"\"\"
- Obasanjo,IBB Among 25 World’s Richest Politicians
- The Latest: Trump expected to shift toward economy
- Waking Up In Hillary Clinton’s America
- Before Trump, Sen Bulworth Spoke Truth To Power
- Fox stars privately bashed election fraud claims the network pushed
- Rupert Murdoch says some Fox News anchors ‘endorsed’ false election fraud claims in Dominion case
- Republicans Exit ERIC, Citing Voter Fraud Potential
\"\"\"
'''

It’s actually like this (as I said things might not be syntactically correct…)

Back in march it would have just acted as a journalist. now you pretty much have to tell it how a journalist acts in great detail. this is not a problem with the users prompt this is due to the deliberate degradation of performance due to unknown reasons

This works (prompt in bold. GPT4 - T=0):

How to select the most newsworthy stories for a basketball fan newsletter:

Each week we mail out the most interesting and newsworthy stories to our subscribers. We select only those stories which will be most interesting and enjoyable for basketball fans.

We usually select 5-8 stories from an original list of 20 news stories from the world of basketball.

Original list:

1. Spain announces extended roster for FIBA Basketball World Cup 2023
2. Tom Segura basketball slam dunk injury video resurfaces in wake of Netflix special ‘Sledgehammer’
3. Doncic leads Slovenia’s extended roster for FIBA Basketball World Cup
4. NBA announces new COVID-19 safety protocols for players and staff
5. LeBron James surpasses Kareem Abdul-Jabbar’s all-time scoring record
6. High school basketball player breaks state record for most points in a single game
7. WNBA announces new pay structure and benefits for players
8. NBA legend Michael Jordan invests in eSports company
9. New basketball training facility opens in Los Angeles
10. NBA player’s controversial tweet sparks debate about mental health in sports
11. College basketball team disqualified from tournament due to COVID-19 outbreak
12. NBA All-Star game to be held without fans due to pandemic
13. NBA player’s shoe line sells out in minutes
14. Basketball coach fired for inappropriate behavior
15. NBA team trades star player in surprising move
16. Basketball player comes out as first openly gay in NBA history
17. NBA player’s charity donates $1 million to local schools
18. NBA team wins championship after dramatic final game
19. NBA player retires after 20-year career
20. Basketball player’s autobiography becomes bestseller

Selected stories:

  1. Spain announces extended roster for FIBA Basketball World Cup 2023 - This story is relevant for fans who follow international basketball and are interested in the upcoming World Cup.

  2. Doncic leads Slovenia’s extended roster for FIBA Basketball World Cup - This story is interesting because it features a popular NBA player, Luka Doncic, who will be playing for his home country in the World Cup.

  3. LeBron James surpasses Kareem Abdul-Jabbar’s all-time scoring record - This is a significant milestone in basketball history and will be of interest to all basketball fans.

  4. NBA team trades star player in surprising move - This story will be of interest to fans who follow the NBA closely and are interested in player trades and team strategies.

  5. NBA team wins championship after dramatic final game - This is a major event in the basketball world and will be of interest to all fans.

  6. Basketball player comes out as first openly gay in NBA history - This is a significant moment in sports history and will be of interest to fans who support diversity and inclusion in sports.

  7. NBA player’s charity donates $1 million to local schools - This story highlights the positive impact that basketball players can have off the court, which can be inspiring for fans.

  8. Basketball player’s autobiography becomes bestseller - This story will be of interest to fans who enjoy learning more about the personal lives and careers of basketball players.

1 Like