Need human like response to test the model performance

20euai044 · November 28, 2023, 7:20am

working on a llm application, now need to test the performance for the given set of articles. how can I create a set of questions with answers that are good human like response . So that I can compare them with generated answer of the model.

types like how,why,what and more and be used .

Foxalabs · November 28, 2023, 12:09pm

This is an evaluation, or Eval, you can find details here

_j · November 28, 2023, 12:34pm

One wouldn’t use a framework ultimately designed for submitting AI feedback cases to OpenAI.

If you want to create a set of questions, you don’t actually need to create them. You can use several different open-source training sets used to train AI models.

GPT4All for example has 700k+

1 Data Collection and Curation
We collected roughly one million prompt-
response pairs using the GPT-3.5-Turbo OpenAI
API between March 20, 2023 and March 26th,
2023. To do this, we first gathered a diverse sam-
ple of questions/prompts by leveraging five pub-
licly available datasets:
• The unified chip2 subset of LAION OIG.
• Coding questions with a random sub-sample
of Stackoverflow Questions
• Instruction-tuning with a sub-sample of Big-
science/P3
• Conversation Data from ShareGPT
• Instruction Following Data curated for Dolly
Dolly (Conover et al.)
We additionally curated a creative-style dataset us-
ing GPT-3.5-Turbo to generate poems, short sto-
ries, and raps in the style of various artists.

You can also see how gpt-3.5-turbo answered in March vs today…

If you need to ask domain-specific questions, you can probably have GPT-4 synthesize a set of them. However, you’ll also likely be returning to the March version if you don’t want it to give up the task after 500 tokens.

20euai044 · November 29, 2023, 10:24am

I’m using pre- trained gpt model for my use case. So I need ground truth to evaluate the results and form metrics

Topic		Replies	Views
Evaluating the effectiveness of text generation API	1	576	November 12, 2021
How to test an API, built on GPT? API	2	213	April 9, 2024
Validating and measuring quality of AI generated summary of json data Prompting gpt-4	2	1052	September 21, 2023
Fine tuning using a corpus API api	8	968	July 13, 2023
Generating dataset of prompt-completion pairs for fine-tuning Prompting	0	1211	February 20, 2023

Need human like response to test the model performance

Related Topics