How would you go about fine tuning dataset for Medical DRG Coding?

risticgoran99 · March 18, 2024, 12:04am

Hi all, I have never done fine tuning/embeddings/langchain before and a problem I have is rather challenging so I’d be thankful if someone with expertise could guide me in the right direction. I’m trying to train LLM for medical patient coding, particularly adhering to Australian Coding Standards for diagnoses and procedures. The end goal is to develop a system capable of generating primary and secondary diagnoses, as well as procedures, based on a patient’s medical history. However, the current OpenAI GPT model lacks knowledge of Australian Coding Standards, ACHI Codes, making it challenging to accurately interpret diagnoses and procedures. The data about this is described in few books which you can see here:

This is just small sample from which you can see format that data is in, but there is 15k of these interventions and 7k diagnoses. Many entries say look up to other section for more information. This is raw knowledge about how to tackle coding, and what each code means. Additionally I have dataset of 22k real world examples that are already coded. However chatgpt currently has no knowledge what these codes mean and which codes should go with other and what not. So how would you go about building dataset? Would you use fine tuning or something with embeddings?

In the end it should work like this:

Given input: “Patient is admitted for drainage of ascites due to known underlying liver disease.” Desired output:

Principal diagnosis: Ascites
Additional diagnosis: Liver disease
Procedure: Drainage of ascites

If fine tuning is a way to go, can you give me few examples how dataset would look like for something like this.

medical-coding-ai · May 27, 2024, 2:03am

RAG will be very expensive with OpenAI since you have to include lots of supportive codes so AI can make reasoning with good differentiation of what DRG code excludes another. You can try to fine tune Llama 3 to sumarize that part and then put it in GPT 4 with long context window. It’s definetelly doeble, the question is - how much it would cost and if it worth it to deploy own model for that. Med.Report supports free medical coding automation on their website. It’s more for US market but you can try to contact them via linkedin. They have an amazing team of ML scientists and healthcare expert and are very opened. Since you are in Australia it shouldn’t be a problem in terms of competition.

Topic		Replies	Views
Is fine tuning gpt3.5 for medical qna such as medication dosing an appropriate use case? API fine-tuning	0	163	July 14, 2024
Creating and fine-tuning your own GPT model API	3	21636	September 21, 2024
Need Advice on Fine-Tuning GPT-4 for Multiple Output Types API	5	161	September 26, 2024
Seeking passionate collaborators to transform healthcare with AI Community chatgpt	4	1548	December 16, 2023
Fine-tuning GPT to learn a new coding language Prompting codex , chatgpt , plugin-development , fine-tuning , api	3	3602	December 24, 2023

How would you go about fine tuning dataset for Medical DRG Coding?

Related topics