Advice needed for JEL code prediction fine-tuning task


I’m currently working on a project where I’m trying to predict JEL codes (Journal of Economic Literature classification codes) based on a list of keywords. I’m using OpenAI’s language models for fine-tuning, specifically Babbage, Carl, and Davinci.

However, I’ve noticed that the results of each model can vary significantly, with surprising results such as Davinci being the worst performer. I’m seeking advice on the design of the prompt and associated completions to optimize the performance of the language model.

I’m considering two possible prompt formats and would like to know which one would be the most suitable for my task.

Prompt format 1:

Keyword 1: Australian dollar Keyword 2: Common currency Keyword 3: Monetary policy Keyword 4: New Zealand

Prompt format 2:

Keywords: DSGE model; Bayesian estimation; Time-varying risk premia; Monetary policy


If anyone has any advice on which prompt format would be more effective for JEL code prediction and why, I would greatly appreciate it. Additionally, if anyone has any resources that could help me better understand and study fine-tuning language models for this type of task, it would be very helpful.

Thank you in advance for your help

Welcome to our dev community!

Have you tried just using a two-shot with one of the newer models like GPT-3.5? You might not even need to fine-tune if you just start each prompt with two or three examples.

If you want to continue down the fine-tuning path, I’ll let someone with more experience step up. I fine-tuned a couple years ago right after it was made available, but the results and cost had me back to using the newer, better models.

Something like classification should be able to be done with a good prompt and 2 or 3 examples…maybe?

1 Like

Thank you for your response and welcome.

Currently I can only use gpt 3.5 and 4 via chatGPT and not with my API (even from playground I don’t see these models). Am I doing something wrong?

In any case my problem is more leagated to the fact that I have a classification problem both multi label and a multi class and I was looking for some information on how to structure prompts and completions just on cases similar to mine.