Dataset format for my own data from database

I have a project where I have to read emails and generate a type of mapper to relate information from the email with information in 2 database tables through the fine tune that I think is the best option but I can’t understand with what I can do so that the model has the information from my tables and that when I send the email in plain text it returns the related information from my database data, as I structure the training file with my database data.
for example : I have the table customers : {‘table_name’: ‘customers’, ‘accountNumber’: ‘94571’, ‘accountName’: ‘Test’, ‘specialInstructions’: ‘Instructions’, ‘countries_indicative’: 44}, how I can create the jsonl file with table customers.

I apreciate every opinion, thanks.

Welcome to the community!

I’d like to advise you that fine-tuning with your database data will not allow the model to recall that data. You can use fine tuning to modify behavior to an extent, but it doesn’t really affect knowledge.

I’d recommend a reevaluation of your process: how does a human solve this particular task? I imagine you might receive an email, and then perform a query to figure out the special instructions.

Trying to compose a prompt that instructs the model to perform these actions (i.e. generate a query), and then generate a response based on the query result will likely be a more fruitfuil pursuit, at least in my experience :slight_smile:

1 Like

So isn’t it possible for me to train with data from my database and have the AI return that data to me? I have the process of taking the emails and the process of getting a GPT json with data that I need but it is data from the same email and it still does not relate, I can’t think of another way to relate and create a mapper because there are times when new emails arrive and my idea is that the AI was in charge of relating as much as possible.
I need a solution, thank you for replying.

How would you do it? If you forget the AI, how would you accomplish this task?

Don’t think of an LLM as a database, think of it more like a processor. Your prompt is your program, and if you give the program the right functions(tools), the program can decide to run itself again with a different prompt.

I don’t fully understand your process, but I’m guessing it could go something like this

step 1: LLM(prompt+email) → generate search query.

e.g.:

Email:

Hello, my name is John Smith, and I am severely aggravated by the fact that your product doesn’t work right

Prompt

You are dispatcherbot. You don’t answer emails, but rather categorize them so that they can be processed by the correct entity. Please attempt to fill out the following schema

{
    "full_name": string, // name of sender
    "email"?: string, // don't set if not known
    "account_number"?: string, // 5 digit number. don't set if not known.
    "sentiment": "positive"|"neutral"|"negative", // how the user seems to feel
    "category": "product"|"billing"|"feedback"|"other" // pick whatever matches best
}

output:

{
    "full_name": "John Smith",
    "sentiment": "negative",
    "category": "product"
}

step 2: do a lookup on your database

if(account_number) return getInstructionsByAccountNumber(account_number)
elif(full_name) return getInstructionsByName(full_name)

output:

John Smith is a choleric person that mellows out when talking about monster trucks. Try to deescalate the situation by relating the situation in terms of monster trucks.

step 3: process the email with the retrieved account instructions

Email:

Hello, my name is John Smith, and I am severely aggravated by the fact that your product doesn’t work right

Prompt:

You are Customer Service Bot. Your job is to answer customer emails. For this particular customer, you have the following special instructions:

Retrieved_Instructions:

John Smith is a choleric person that mellows out when talking about monster trucks. Try to deescalate the situation by relating the situation in terms of monster trucks.

output:

You know how the police doesn’t really like it if you drive your monster truck on public roads? We’re unfortunately in a similar situation - all our inventory is currently tied up in these damn supply chain issues so we…

1 Like