Dataset format for my own data from database

octavio.vindas · April 12, 2024, 3:17pm

I have a project where I have to read emails and generate a type of mapper to relate information from the email with information in 2 database tables through the fine tune that I think is the best option but I can’t understand with what I can do so that the model has the information from my tables and that when I send the email in plain text it returns the related information from my database data, as I structure the training file with my database data.
for example : I have the table customers : {‘table_name’: ‘customers’, ‘accountNumber’: ‘94571’, ‘accountName’: ‘Test’, ‘specialInstructions’: ‘Instructions’, ‘countries_indicative’: 44}, how I can create the jsonl file with table customers.

I apreciate every opinion, thanks.

Diet · April 12, 2024, 5:02pm

Welcome to the community!

I’d like to advise you that fine-tuning with your database data will not allow the model to recall that data. You can use fine tuning to modify behavior to an extent, but it doesn’t really affect knowledge.

I’d recommend a reevaluation of your process: how does a human solve this particular task? I imagine you might receive an email, and then perform a query to figure out the special instructions.

Trying to compose a prompt that instructs the model to perform these actions (i.e. generate a query), and then generate a response based on the query result will likely be a more fruitfuil pursuit, at least in my experience

octavio.vindas · April 12, 2024, 5:57pm

So isn’t it possible for me to train with data from my database and have the AI return that data to me? I have the process of taking the emails and the process of getting a GPT json with data that I need but it is data from the same email and it still does not relate, I can’t think of another way to relate and create a mapper because there are times when new emails arrive and my idea is that the AI was in charge of relating as much as possible.
I need a solution, thank you for replying.

Diet · April 12, 2024, 7:55pm

How would you do it? If you forget the AI, how would you accomplish this task?

Don’t think of an LLM as a database, think of it more like a processor. Your prompt is your program, and if you give the program the right functions(tools), the program can decide to run itself again with a different prompt.

I don’t fully understand your process, but I’m guessing it could go something like this

step 1: LLM(prompt+email) → generate search query.

e.g.:

Email:

Hello, my name is John Smith, and I am severely aggravated by the fact that your product doesn’t work right

Prompt
You are dispatcherbot. You don’t answer emails, but rather categorize them so that they can be processed by the correct entity. Please attempt to fill out the following schema
{
    "full_name": string, // name of sender
    "email"?: string, // don't set if not known
    "account_number"?: string, // 5 digit number. don't set if not known.
    "sentiment": "positive"|"neutral"|"negative", // how the user seems to feel
    "category": "product"|"billing"|"feedback"|"other" // pick whatever matches best
}

output:

{
    "full_name": "John Smith",
    "sentiment": "negative",
    "category": "product"
}

step 2: do a lookup on your database

if(account_number) return getInstructionsByAccountNumber(account_number)
elif(full_name) return getInstructionsByName(full_name)

output:

John Smith is a choleric person that mellows out when talking about monster trucks. Try to deescalate the situation by relating the situation in terms of monster trucks.

step 3: process the email with the retrieved account instructions

Email:

Hello, my name is John Smith, and I am severely aggravated by the fact that your product doesn’t work right

Prompt:

You are Customer Service Bot. Your job is to answer customer emails. For this particular customer, you have the following special instructions:

Retrieved_Instructions:

John Smith is a choleric person that mellows out when talking about monster trucks. Try to deescalate the situation by relating the situation in terms of monster trucks.

output:

You know how the police doesn’t really like it if you drive your monster truck on public roads? We’re unfortunately in a similar situation - all our inventory is currently tied up in these damn supply chain issues so we…

Topic		Replies	Views
Trained Fine-tuning on CSV File API chatgpt , api	0	483	February 27, 2024
Import differing text files and build a database - Python API gpt-4 , chatgpt , fine-tuning , api	6	23599	December 12, 2023
GPT 2 training in local gpu for custom database API	2	1756	July 27, 2023
Optimal Processing Data for reduced token Usage API	1	343	September 28, 2023
I need that the fine tune model answer over my training data, its posible? or ¿ Do I need use embeddings-based search? API embeddings , fine-tuning , training	3	158	April 16, 2024

Dataset format for my own data from database

step 1: LLM(prompt+email) → generate search query.

step 2: do a lookup on your database

step 3: process the email with the retrieved account instructions

Related Topics