Please help me understand why my prompt is not working properly

Hi everyone,

I am writing a prompt to extract project progress information (such as risks, decisions, updates, dependencies) from communication exchanged between employees such as emails, Slack conversations, meeting notes. The prompt explicits the entities and the subentities that should be extracted if available.

I have tested and trained my prompt with a set of dummy examples generated with chatGPT and it is performing well. Interestingly, the prompt demonstrates commendable performance when provided with simpler instructions such as ‘extract risks, decisions, updates from the below email’.

I find myself at an impasse and would greatly appreciate any insights, suggestions, or tips you might have to enhance the effectiveness of my approach.

Thanks a lot!

Prompt:
I will provide emails, messages, or meeting notes from company employees. Your task is to identify project management entities as follows:

(1) Risks:

  • Description
  • Raised by: person raising the risk or sender of the communication.
  • Raised on: date the risk was first mentioned. If not available use the date of the communication.
  • Mitigation plan
  • Assigned to: person responsible for resolving the risk.

(2) Decisions:

  • Description
  • Rationale: more context about why the decision has been made.
  • Who made the decision: person reporting the decision or sender of the communication.
  • Raised on: date the decision was first mentioned. If not available use the date of the communication.

(3) Updates:

  • Description: summary of project status and recent progress.
  • Next week’s priorities
  • Team: team providing the update. If we don’t know it, assume it is the team of the person sharing the update.
  • When: date of the update. If not available use the date of the communication.

(4) Dependencies:

  • Description
  • Mitigation plan
  • Assigned to: person responsible for handling the dependency.

(5) Resources:

  • Name: Name or brief description of the resource.
  • Link

Additional instructions:
1- Format the output as follows:
Project’s name:
(1) Risks: …
(2) Decisions: …
(3) Updates: …
(4) Dependencies: …
(5) Resources: …

2- Organize entities by project if multiple projects are mentioned. Example of output if there are 2 projects A and B:
Project A:
(1) Risks: …
(2) Decisions: …

Project B:
(1) Risks: …
(2) Decisions: …

3- If an entity is not present, then write N/A. For example if project A doesn’t have any risks or dependencies:
(1) Risks: N/A
(2) Decisions: …
(3) Updates: …
(4) Dependencies: N/A
(5) Resources: …


Real life email:

Date 20 Feb 2024
Update on project X

Hi Anant,

  • This presentation shows how we would like the UI to look like with and without input (no sidebar - all info on the same page …) → Please share your feedback on what’s feasible since you know the possibilities of Streamlit. Let’s do a call when you can.
  • On this slide, the questions of the feedback form

My priority today is:

  • (P0) Continue testing and iterating on the prompt to achieve satisfactory results with this golden set. I totally understand that performance might be downgraded with real life examples but it is important to achieve good results with the golden set and then to iterate on the testing set.
  • Write the small paragraph that will be at the top of the UI.
  • Write the guide (google doc) that will be linked to the UI with additional information to users about (1) examples of input (2) how to request help (3) how we handle their data etc …

Considering the challenges we have on the model, we’re not ready to launch tomorrow - here is my suggestion for updated timeline:

21.02.24:

  • Anant finishes the implementation of the components UI.
  • Myriam finalises required text and documentation to be added to the UI

22.02.24:

  • Myriam reaches a satisfactory quality of outputs

23.02.24:

  • UI ready with final prompt and documentations
  • Internal testing of tool (Myriam and Paul)

26.02.24:

  • Outreach to potential external testers

This timeline does not take into account any dependencies on the Tech team @Anant please feel free to adjust it

Thanks,
Myriam


Output with my prompt:

Project X:

(1) Risks: N/A
(2) Decisions: N/A

(3) Updates:

  • Description: Attached logo with no background. Presentation showing the desired UI. Questions of the feedback form.
  • Next week’s priorities: Continue testing and iterating on the prompt, writing the small paragraph for the UI, and writing the guide for users.
  • Team: N/A
  • When: 20 Feb 2024

(4) Dependencies: N/A
(5) Resources: N/A


Expected output

Project X

Risks

  • Description: challenges with the model that hinder readiness for launch.
  • Mitigation plan: Continue testing and iterating on the prompt to achieve satisfactory results with the golden set
  • Raised by: Myriam
  • When: 20.02.24

Decisions

  • Description: updated timeline for the LunaVista project. Launch to external users postponed to Monday 26th Feb.
  • Rationale: model currently not meeting the minimum quality bar.
  • Who took the decision: Myriam
  • When: 20.02.24

Updates

  • Description: team is currently facing challenges with the model’s quality, resulting in delays to the launch timeline, which has been postponed to 26.02
  • Priorities:
    — Anant to finish the implementation of UI components by 21.02.24.
    — Myriam to finalize required text and documentation for UI by 22.02.24.
    — Myriam to reach a satisfactory quality of outputs by 22.02.24.
    — UI to be ready with final prompt and documentation by 23.02.24.
    — Internal testing of tool by Myriam and Paul on 23.02.24.
    — Outreach to potential external testers to commence on 26.02.24.
  • Team: Myriam’s team
  • When: 20.02.24

Resources

  • Presentation of the UI : link

Hi and welcome to the Forum. A couple of observations on top off my head: some of your five dimensions are not clearly defined and somewhat overlapping, which may make it difficult to perform the information extraction properly. Particularly updates and decisions are not outright clear. Updates to me is more a meta/umbrella dimension that encompasses the rest. Resources may also be subject to misinterpretation (from your example it is clear but on first reading it, I thought of human resources). So at a minimum I would add a short definition for these dimensions in your prompt and more clearly delineate them.

Personally, I’d probably try and see if you get better results with a fine-tuned GPT 3.5 model, training it on a set of emails with the corresponding output that you would expect. You might experiment with a very small dataset initially (25-30 examples), just to see if you are yielding the desired results.

1 Like

Thank you very much!

In terms of defining certain entities like decisions or updates, I initially included detailed definitions within my prompt. However, I’ve observed that adding more information to the prompt correlates with a decrease in the quality of the output.

For instance, when using a straightforward prompt like ‘Extract risks, decisions, updates from the email below,’ the resulting output tends to be better (though not perfect) compared to when I provide a more intricate prompt.

Yeah, try to keep it concise and focused when you add this supplementary information. The main point is to reduce ambuigity as much as possible. With the current prompt and dimensions, I am not surprised that the model gravitates towards updates.