Health Q&A dataset

Hi, I am thinking about using fine tuning to create a model that can answer simple health knowledge, but I can’t collect enough Q&A datasets, If someone has recommended sites or available datasets please let me know. Thanks.

using insert mode Davinci 2

To create a medical advice fine tune we will need to data scrape some websites that contain correct medical advice. Here is a list of those websites:
1.[insert]

sample output

1. https://www.webmd.com/
2. https://www.mayoclinic.org/
3. https://www.healthline.com/
4. https://www.medicalnewstoday.com/

in just a few minutes i expanded on this using [insert] mode:

To create a medical advice fine tune we will need to data scrape some websites that contain correct medical advice. Here is a list of those websites:
1. https://www.webmd.com/
2. https://www.mayoclinic.org/
3. https://www.healthline.com/
4. https://www.medicalnewstoday.com/

Here is an example of a medical issue and advice for the issue in a short summary from webmd.com

To create a medical advice fine tune we will need to data scrape some websites that contain correct medical advice. Here is a list of those websites:
1. https://www.webmd.com/
2. https://www.mayoclinic.org/
3. https://www.healthline.com/
4. https://www.medicalnewstoday.com/

Here is an example of a medical issue and advice for the issue in a short summary from webmd.com

"If you have a cold, you might think you can't do much to feel better. In fact, there are a number of things you can do to ease your symptoms and speed your recovery. Drink plenty of fluids. Drinking lots of fluids helps to thin out mucus and replace fluids your body is losing because of your cold. Drink warm, clear fluids like water, broth, or decaffeinated tea. Drink them often throughout the day. Avoid alcohol and caffeinated drinks, which can make dehydration worse."

Link to the information: https://www.webmd.com/cold-and-flu/cold-guide/cold-remedies#1

an example line of an openAI dataset from the summary using the format <Question> and <Advice>:

"Can I drink alcohol if I have a cold?" and "No, alcohol will make dehydration worse."

Correct .jsonl formatting of the example for a openAI dataset

[
	{"question": "Can I drink alcohol if I have a cold?", "advice": "No, alcohol will make dehydration worse."}
]
2 Likes

thanks a lot, if you done it before?