How to best structure CSV embeddings to elicit clear and correct answers from

drewf · July 24, 2023, 5:19am

Hi

Wondering if anyone can help.

I’m currently creating a csv embedding for an AI chatbot that provides answers to user queries about certain aspects of a utilities market.

What I am finding is that when I list certain standards, the AI is able to create an accurante response however when I try and ask the AI to list these all in a group I find that only certain standards are being listed with some being left out.

I was wondered what the community finds to be the best structure when creating CSV embedding? Is a question / answer structure the best way to go or is it better to categorise each group of questions and response and where certain response linked back to a transaction ID or Market group, create separate columns for each?

Curious to hear your thoughts. Really want to avoid certain answers in my embedding getting parsed together in the overall response and ensure every standard in the embedding file gets picked up.

Thanks

udm17 · July 24, 2023, 5:50am

The best way probably would be to structure the csv the same way the user might ask a question off the AI chatbot.

The second approach is valid as well, but you would need a sort of classifier first which could categorise the question and then run the embeddings match, though if your csv/ database isn’t too big, should not be a problem.

huronsen · July 24, 2023, 10:17pm

this is interesting. I’m not sure if I understand you fully.

I dont use question answer structure. Also I do make multiple columns, in contrast to two what most do. If using 3.5, the Ai is quite good at understanding your dataset. If you talking csv, how does your data structure look like? I made best results when “grouping” by row and always add attribute name to each data. Looks like: Title row, attribute: 1, attribute: 2, attribute: 3 - also i did this with question answers.
Also, if you want specific outputs, you can’t be specific enough while creating the prompt. Creating instructions can be very helpful for consistency. Also, (I think) prompts are very dataset specific. One application of LLM I did, therefore, uses an Input form, so that all users query the same, because the dataset has over 20’000 rows with 33 columns each. It is absolutely capable to structure and output the response in the format you want.

drewf · July 24, 2023, 11:58pm

I’ve provided a snip of my csv embedding below. All columns present in the embedding are included in this snip. I thought the best approach would be to place columns that contained the most broad information at the left and get more specific the further right we go. I.e. Group, Sub-Group, Market Term, Description, Market Code (if applicable).

I then took this embedding structure and reformatted it to a question / answer format. To compare both and see what structure was optimal. I’ve observed similar results also in that unless the user prompt directly matches the question in the AI embedding, the answer generated wasn’t one that satisfied the question or prompt.

In terms of this user input form, does this generate a query based on one general prompt and then substitutes the values of the columns in your embedding? I.e. The context of this question surrounds $(group), the user is interested specifically in $(sub-group). The user wishes to know, $(question)

Translates to: The context of this question surrounds Bilaterals, the user is interested specifically in Metering. The user wishes to know what is a Verification of Supply Arrangements bilateral

I’ve used the values of the column headings in my snip to provide context. I’ve never considered an input form before. Very interesting

Hope this helps

Topic		Replies	Views
Send CSV file for use in Chat Completion? API	19	25087	December 13, 2023
Efficient way for Chunking CSV Files or Structured Data API	9	4032	September 5, 2024
How to analyze big CSV files for a chat bot? API chatgpt , api , development	1	3219	March 19, 2024
Correctly prompting for data formats Prompting	3	5503	April 26, 2023
Need help for reading CSV with Assistant API Prompting gpt-4 , chatgpt	0	1305	November 19, 2023

How to best structure CSV embeddings to elicit clear and correct answers from

Related topics