I am working on extracting Labour Market statistics for different skillsets in different parts of the world. So trying to build a prompt with few-shot learning which will give a fixed format output in a JSON format. Please find the sequence of code I have written to generate the output.
My queries are :
-
I have a dataframe which has city column and I am passing this city to the dynamic prompt(mentioned below in the fourth part of the code shown below) and taking the number of jobs and filing in the other column of the dataframe so as of now in the code mentioned below I am passing the prompt & calling the api for each record(city). I have about 100 records and for that it takes about 25 mins. Is there a faster & cheaper way to do it for all the 100 records instead of calling the api everytime for each record.
-
Is there a better way to pass the dynamic prompt where I can keep the prompt same and just change the city name ?
-
The code I have written to extract Data Scientist opportunities for a specific city with few-shot learning(In this case I am trying to pass 3 examples to the api to show it how I want to the output in return) giving me an output. But this does not seems to the fastest and optimised version. Could any one please help me make it optimise and show me how the same thing can be done in lesser number of steps.
-
Is it even possible to extract this type of current factual information from this model, which would be near to real, given that it was trained on the data till 2021.
It would be really helpful if someone can go through my long post and answer my queries. I am not a programmer and trying to learn how to best use this API so it would be really great learning for me if someone can help me with the correct and optimised way to code this.
First part :
sample_response1 = openai.openai_object.OpenAIObject()
sample_response1['role'] = "assistant"
sample_response1['content'] = None
sample_response1['function_call'] = {
'name': "labour_market_statistics",
'arguments': json.dumps(
{
'city': 'Chicago',
'country_name': 'United States',
'skill_set': 'Data Scientist',
'number of jobs': 6682
}
)
}
sample_response2 = openai.openai_object.OpenAIObject()
sample_response2['role'] = "assistant"
sample_response2['content'] = None
sample_response2['function_call'] = {
'name': "labour_market_statistics",
'arguments': json.dumps(
{
'city': 'Halifax',
'country_name': 'Canada',
'skill_set': 'Data Scientist',
'number of jobs': 466,
}
)
}
sample_response3 = openai.openai_object.OpenAIObject()
sample_response3['role'] = "assistant"
sample_response3['content'] = None
sample_response3['function_call'] = {
'name': "labour_market_statistics",
'arguments': json.dumps(
{
'city': 'Gurugram',
'country': 'India',
'skill_set': 'Data Scientist',
'number of jobs': 2408
}
)
}
Second part :
function = {
'name': "labour_market_statistics",
'description': "A function that takes prompt related with Labour Market statistics and extract information based on that",
'parameters': {
'type': "object",
'properties': {
'city':{
'type': "string",
'description':"The name of the city mentioned in the User content."
},
'country_name':{
'type': "string",
'description':"The name of the country in which the city is located."
},
'skill_set': {
'type': "string",
'description': "The skillset for which we want to extract vacancies from the city.",
},
'number of jobs':{
'type': "number",
'description': "The total number of jobs available in the mentioned city for the skillset mentioned.",
}
}},
'required': ["city", "country_name", "skill_set", "number of jobs"],
}
Third part :
def chat_completion_request(messages, model=GPT_MODEL, functions=None, function_call=None, temperature=TEMPERATURE):
response = openai.ChatCompletion.create(
model=model,
messages=messages,
functions=[functions],
function_call=function_call, # this forces calling `function`
temperature = temperature
)
content = response["choices"][0]["message"]["function_call"]["arguments"]
content_json = json.loads(content)
return content_json
Fourth part :
for i in range(len(data)):
city_name = data['cityLabel'][i]
dynamic_prompt = 'Please provide the number of Data Science jobs avaiable on LinkedIn for the city of {} for the last one month.'.format(city_name)
messages = [{'role': 'system', 'content': 'You are a Labour Market Officer who wants to gather data about Labour Market statistics. In order to do that, you are currently looking for vacancies available in different skillset on LinkedIn. Extract the relevant data to use as arguments to pass into the given function provided.'},
{'role': 'user', 'content': 'Please provide the number of Data Science jobs avaiable on LinkedIn for the city of Chicago for the last one month.'},
sample_response1,
{'role': 'user', 'content': 'Please provide the number of Data Science jobs avaiable on LinkedIn for the city of Halifax for the last one month.'},
sample_response2,
{'role': 'user', 'content': 'Please provide the number of Data Science jobs avaiable on LinkedIn for the city of Gurugram for the last one month.'},
sample_response3,
{'role': 'user', 'content': dynamic_prompt}]
chat_response = chat_completion_request(messages, functions=function, function_call={"name": "labour_market_statistics"})
data['api_generated_number'][i] = chat_response['number of jobs']
city_name = ''
# print(chat_response)
data.head()
I am using gpt-3.5-turbo-0613 model and kept the Temperature as 0 because I require no variation.
Thank you.