Best model for creating code reviewer

Hello all, I am working on an application that will review the code just like we do in chat gpt itself. I want to fine tune that model I want to list down the errors, suggestions and newly rewritten code. Which open ai model is suitable for this task. I am thinking to go with 4 omni or train my own model using azure ai studios. Do you all have any suggestions how can I proceed and what free same tools are available in market.

If I where you I would start with MVP based on a finetune gpt 3.5 turbo. That way you’ll get something down fast with good quality. Going on azure ai studios will be much more time consuming and expensive - In my opinion its suited for a later stage of your idea.

You can fine tune gpt-3.5 easy and effectively here is a guide

Also I developed a free web app to quickly create curated jsonl datasets for fine tuning. You can check it out here. Let me know if you like it!

@guidotrevisan6 Thanks for replying, but why not go with 4o it is much faster, cost effective and with latest data as compared to gpt-3.5

GPT-4o would be a good choice, but there is a waitlist to fine-tune it and I wouldn’t bet on getting access to it soon enough. The datasets you will use to train GPT-4O will be the same as GPT-3.5 so might as well start with the current model you can access right now and join the waitlist in the meantime.

If they ever release it (unlikely) CriticGPT would be the best choice for this.

Until then, gpt-4 or gpt-4o will be the best OpenAI models for this task.

@guidotrevisan6 @anon22939549 I read the doc for now we can’t train on gpt 4 so I guess Gpt-3.5 should be the go to model. Do you guys have any Idea about Claude

First, you’ll want to ensure first that a fine-tuned gpt-3.5-turbo outperforms gpt-4o. Note that in the CriticGPT paper I shared they were using a fine-tuned gpt-4 model tuned on about 40,000 examples and a total of about 12M tokens. While this isn’t an overwhelmingly large amount for fine-tuning, it’s likely a very high-quality training set. And, even then, it didn’t outperform the base gpt-4 model by that much. So, unless your fine-tuning data is much larger and/or better than theirs, I wouldn’t be optimistic about a fine-tuned gpt-3.5-turbo being better than gpt-4o.