What is the best format to input Q&A data for ChatGPT or the OpenAI Assistant?

jason123 · May 3, 2024, 1:32am

Hello everyone,
I am currently working on summarizing long texts, approximately 10,000 to 20,000 tokens in length.

My current approach involves segmenting the long text into smaller paragraphs, having AI generate summaries for each, and then integrating these into a final comprehensive summary for analysis.

My original data is stored in Excel in the following format:
1 \t Q \t question 1
\t A \t answer 1
2 \t Q \t question 2
\t A \t answer 2
… and so forth.

If I want to input this into ChatGPT to process the Q&A content, how should I modify the format? Would a direct text string input be better? Should the Q&A headings include specific numbers, like:
Q1: XXXXXX
A1: YYYYYYY
Q2: ZZZZZZZ
A2: LLLLLLLLLL
…

Or should each Q&A pair be on the same line, for example:
Q: XXXXXXX \t A: YYYYYYYY
Q: ZZZZZZZZ \t A: LLLLLLLLL

I’ve also considered using the JSON format, but given the length of my texts, I’d like to minimize unnecessary token usage.

Thank you all for your help!

deduskamoroz.kgd · May 13, 2024, 3:11pm

Best format to feed to GPT is JSON:

{
“QA_set”: [
{
“question”: “Question 1”,
“response”: “response 1”
},
{
“question”: “Question 2”,
“response”: “response 2”
},
{
“question”: “Question 3”,
“response”: “response 3”
}
]
}

advice: to save on tokens, shorten keys (question / response) to ‘Q’ and ‘R’, then add row ‘Q = question, R = response’

jason123 · May 14, 2024, 12:11am

Thank you for your response. Previously, I used to input text directly to the assistant like this:

Q: Q1 \n
A: A1 \n
Q: Q2 \n
A: A2 \n
......

However, when reviewing the thread content, it appears all stuck together like this: Q: Q1 A: A1 Q: Q2 A: A2…, which is quite strange.

Regarding your suggestion to use JSON, besides the challenge of converting the format, I have some questions:

Are the brackets, curly braces, and quotes in JSON format { } “”) counted in the token consumption?
Since JSON format includes quotation marks (" "), should I remove any quotation marks from my Chinese Q&A content beforehand? Otherwise, it might affect the JSON parsing, like this:

{
  “question”: “Question 1”,
  “response”: “response 1”
}

You suggested shortening keys (question/response) to ‘Q’ and ‘R’. Can I assign numbers to Q and R, like Q1, R1, Q2, R2, until the end?

{
  “question”: “Question 1”,
  “response”: “response 1”
}

Best regards;

deduskamoroz.kgd · May 20, 2024, 1:45pm

Hi,

Every character in prompt is counted as token.
Before sending request to OpenAI API you have to encapsulate special characters with backslash - \

2.1. To make it easier, ask Chat GPT to write that piece of code to prepare raw texts into JSON / API applicable format on your preferable programming language.

jason123 · May 20, 2024, 2:11pm

Thank you very much. So, if using the OpenAI Assistant API to analyze QA datasets, is it easier for the AI to understand if the data is in JSON format?

From the example you provided, can the “question” and “response” keys be sequentially numbered like Q1, Q2, Q3…Qn and A1, A2, A3…An?

If an interview dataset has too many QA pairs and I need to split them, what format should the QA_set be in?

Thank you.

Topic		Replies	Views
Fine Tuning GPT-3 for Consistent Output Format Prompting	11	7047	December 20, 2023
Azure AI Language Studio- QnA Community gpt-35-turbo , chatgpt , assistants-api	0	102	September 23, 2024
Prompt integrating JSON, or JSON request after the prompt API chatgpt , api , json	5	21282	December 26, 2025
JSON data in training file API	2	3510	December 16, 2023
Should I use YAML or JSON for embeddings text? API embeddings	4	2397	December 17, 2023

What is the best format to input Q&A data for ChatGPT or the OpenAI Assistant?

Related topics