How to make function calling faster while passing prompt to the api? How to extract current factual information from the api?

I am working on extracting Labour Market statistics for different skillsets in different parts of the world. So trying to build a prompt with few-shot learning which will give a fixed format output in a JSON format. Please find the sequence of code I have written to generate the output.

My queries are :

  1. I have a dataframe which has city column and I am passing this city to the dynamic prompt(mentioned below in the fourth part of the code shown below) and taking the number of jobs and filing in the other column of the dataframe so as of now in the code mentioned below I am passing the prompt & calling the api for each record(city). I have about 100 records and for that it takes about 25 mins. Is there a faster & cheaper way to do it for all the 100 records instead of calling the api everytime for each record.

  2. Is there a better way to pass the dynamic prompt where I can keep the prompt same and just change the city name ?

  3. The code I have written to extract Data Scientist opportunities for a specific city with few-shot learning(In this case I am trying to pass 3 examples to the api to show it how I want to the output in return) giving me an output. But this does not seems to the fastest and optimised version. Could any one please help me make it optimise and show me how the same thing can be done in lesser number of steps.

  4. Is it even possible to extract this type of current factual information from this model, which would be near to real, given that it was trained on the data till 2021.

It would be really helpful if someone can go through my long post and answer my queries. I am not a programmer and trying to learn how to best use this API so it would be really great learning for me if someone can help me with the correct and optimised way to code this.

First part :

sample_response1 = openai.openai_object.OpenAIObject()
sample_response1['role'] = "assistant"
sample_response1['content'] = None
sample_response1['function_call'] = {
    'name': "labour_market_statistics",
    'arguments': json.dumps(
        {
            'city': 'Chicago',
            'country_name': 'United States',
            'skill_set': 'Data Scientist',
            'number of jobs': 6682
        }
    )
}
sample_response2 = openai.openai_object.OpenAIObject()
sample_response2['role'] = "assistant"
sample_response2['content'] = None
sample_response2['function_call'] = {
    'name': "labour_market_statistics",
    'arguments': json.dumps(
        {
            'city': 'Halifax',
            'country_name': 'Canada',
            'skill_set': 'Data Scientist',
            'number of jobs': 466,
        }
    )
}
sample_response3 = openai.openai_object.OpenAIObject()
sample_response3['role'] = "assistant"
sample_response3['content'] = None
sample_response3['function_call'] = {
    'name': "labour_market_statistics",
    'arguments': json.dumps(
        {
            'city': 'Gurugram',
            'country': 'India',
            'skill_set': 'Data Scientist',
            'number of jobs': 2408
        }
    )
}

Second part :

function = {
    'name': "labour_market_statistics",
    'description': "A function that takes prompt related with Labour Market statistics and extract information based on that",
    'parameters': {
        'type': "object",
        'properties': {
            'city':{
                'type': "string",
                'description':"The name of the city mentioned in the User content."
            },
            'country_name':{
                'type': "string",
                'description':"The name of the country in which the city is located."
            },
            'skill_set': {
                'type': "string",
                'description': "The skillset for which we want to extract vacancies from the city.",
            },
            'number of jobs':{
                'type': "number",
                'description': "The total number of jobs available in the mentioned city for the skillset mentioned.",
            }
        }},
        'required': ["city", "country_name", "skill_set", "number of jobs"],
    }

Third part :

def chat_completion_request(messages, model=GPT_MODEL,  functions=None, function_call=None, temperature=TEMPERATURE):
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        functions=[functions],
        function_call=function_call,             # this forces calling `function`
        temperature = temperature
    )
    
    content = response["choices"][0]["message"]["function_call"]["arguments"]
    content_json = json.loads(content)
    
    return content_json

Fourth part :

for i in range(len(data)):
    city_name = data['cityLabel'][i]
    dynamic_prompt = 'Please provide the number of Data Science jobs avaiable on LinkedIn for the city of {} for the last one month.'.format(city_name)
    
    messages = [{'role': 'system', 'content': 'You are a Labour Market Officer who wants to gather data about Labour Market statistics. In order to do that, you are currently looking for vacancies available in different skillset on LinkedIn. Extract the relevant data to use as arguments to pass into the given function provided.'},
            {'role': 'user', 'content': 'Please provide the number of Data Science jobs avaiable on LinkedIn for the city of Chicago for the last one month.'},
            sample_response1,
            {'role': 'user', 'content': 'Please provide the number of Data Science jobs avaiable on LinkedIn for the city of Halifax for the last one month.'},
            sample_response2,
            {'role': 'user', 'content': 'Please provide the number of Data Science jobs avaiable on LinkedIn for the city of Gurugram for the last one month.'},
            sample_response3,
            {'role': 'user', 'content': dynamic_prompt}]
    
    chat_response = chat_completion_request(messages, functions=function, function_call={"name": "labour_market_statistics"})
    
    data['api_generated_number'][i] = chat_response['number of jobs']
    city_name = ''
    # print(chat_response)
    
data.head()

I am using gpt-3.5-turbo-0613 model and kept the Temperature as 0 because I require no variation.

Thank you.

@miguelwon @lucas.godfrey1000 @cjmungall
Could you guys please look into my post and make suggestions if you have the answers. Thanks.

Addressing your fourth question first - no, this is probably not the right approach for your use case. For this to work reliably (if at all), you would need to provide your program with a separate API for the ‘factual’ data. To understand what I mean by this, look into LangChain Tools as an example implementation for giving the LLM access to an additional API.

Regarding question 2 - yes, I would look into LangChain Prompt Templates as an example.

Regarding questions 1 and 3 - you could look into fine-tuning, but again I would refer you to my answer to question 4.

1 Like

How big are those “skillsets”? Can you give some examples? Are they tables? In what format? Can you pass them to markdown? To make things faster you can perhaps try give as input several examples/tables with a respective ID and ask it to extract the entities in one call (with the respective ID to keep track).
Additional note: have you consider using simple models? In total, many records do you to process? Perhaps this is not a task for a LLM just as GPT-3.5.

1 Like

Thanks a lot Lucas for going through my long post and the reply. Really appreciate it. About the langchain as a solution, this makes sense. Thanks for pointing in that direction.

Hi @miguelwon
Thanks for going through my long post. I have given the example of this skillset in my first part of the code which i have pasted in my post. It is basically a dataframe(as of now I am reading a csv but I can also move the data to the BigQuery table if required) where a record is basically consist of 4 columns, i.e. city, country_name, skill_set and number of jobs. In this particular case, I am looking for the number of jobs available in last 1 month in a particular city.

Can you pass them to markdown?

I did not get what did you mean by markdown here.

To make things faster you can perhaps try give as input several examples/tables with a respective ID

I am passing 3 sample response for my prompt function as of now. Please refer code pasted in part one above. Do you mean instead of these 3 examples as sample for the prompt I should pass 10 or 15 prompts as an example for the prompt. But what do you mean by a respective id which you mentioned in the end of your sentence ? Could you please show an example of this respective id where you can fetch the data in one call as you are suggesting. It would be a really great help.

I have around 100 to 200 data points for which I am trying to get data from OpenAI api. In this particular case I have about 100 to 200 cities for which I am looking for Data Scientist opportunities available in last one month.