How to refine prompt when there is lack of data

Hello,
I am trying to assign skills from a list of skills to a course description and i am trying to make it align to the data that I have which are manually labelled and serve as the ground truth. However, this manual data that I have is insufficient as i only have around 60 so I am wondering how i can refine my prompts more to produce closer results to the ground truth. I have tried zero shot, few shot prompting but it did not work well.

An example will be:
Course Description:

The skills assigned manually (ground truth): agile methodology

I am trying to do this because I cannot possibly label every course manually as it takes up too much work .

1 Like

Welcome to the forum.

What model and settings are you using?

Can you share your system / user / assistant prompts?

I am using gpt 4, chat completion model

{“role”: “system”, “content”: “You are a helpful assistant for skills extraction from learning content.”}

Below is my user prompt
prompt = “”"
The course to assign skills to from {skills_list} is below:
{course}
Take a step-by-step approach in your response, give short reasoning before sharing final answer in ONE PYTHON LIST ONLY.
“”"

1 Like

I would try to use a one-shot of few-shot examples …

You are a helpful assistant for skills extraction from learning content.

This is kinda confusing and open-ended. Try to be more detailed…

You are a helpful assistant for skills extraction from learning content. Assist in identifying and extracting key skills and competencies from educational materials. Analyze the content to pinpoint practical and theoretical knowledge elements that learners should acquire, ensuring a comprehensive mapping to relevant skill frameworks or professional standards where applicable.

Then give one user prompt with the COURSE and an assistant prompt with the answer you’re wanting. Then send the new user prompt and it should mirror your example better.

skills_list = [“problem-solving”, “critical thinking”, “teamwork”, “communication”, “data analysis”, “programming”, “project management”]

course = “”"
This course provides an introduction to project management principles and practices. Students will learn how to initiate, plan, execute, and close projects efficiently while working in teams. The course also covers effective communication strategies and critical thinking to solve project-related problems.
“”"

Use prompt example…

Given the list of skills {skills_list}, identify which are relevant to the following course:
{course}
Provide a step-by-step rationale for each skill selected and present the final answer in a single Python list.

Then set this as an assistant message…

Based on the course description, the following skills from the provided skills_list are applicable:

  • “project management”: The course focuses on project management principles and practices.
  • “teamwork”: It emphasizes working in teams to execute projects.
  • “communication”: Effective communication strategies are covered in the course.
  • “critical thinking”: The course includes solving project-related problems.
  • “problem-solving”: This is implicitly part of managing and executing projects.

The final list of relevant skills for the course, as a single Python list, is:
[“project management”, “teamwork”, “communication”, “critical thinking”, “problem-solving”]

Then your original USER prompt… and it should mirror the example better…

However with this approach, I do not have enough ground truth data to do this for every course, so how should I approach it since I will not be able to have the assistant input for every course?

It should be able to infer what you want by giving it one concrete example… You can even send the same example each time… Give it a try, and let us know?

I have just tried it but the answers given by the model is still quite far away from the ground truth, any other suggestions?

What exactly do you mean by this?

Can you show the prompts you sent and response you got?

It means i have some courses that already has skills assigned to them, but it is limited in quantity.

"messages": [
            {"role": "system", "content": "You are a helpful assistant for skills extraction from learning content. Assist in identifying and extracting key skills and competencies from educational materials. Analyze the content to pinpoint practical and theoretical knowledge elements that learners should acquire, ensuring a comprehensive mapping to relevant skill frameworks or professional standards where applicable."},
            {"role": "user", "content": f"{course_desc}"},
            {"role":"assistant","content":f"{manual_assigned_answer}"},
            {"role":"user","content": f"{new_course_desc}"}
            ]

new_course_desc refers to another separate course description

Right. I’m saying just one of those as the example. Then, when you ask for a second one, it knows what you’re wanting more closely. Or should!

Yeap I get what you mean but unfortunately, the output is still not as desired, do you have any other suggestions instead?