Over a year ago, I asked on the OpenAI subreddit if someone had a recommendation for programs that help you collect and curate fine tuning data that you collect by hand. So essentially, I only wanted an UI that you could enter your data into so that you do not have to dabble with the jsonl-Files manually.
To my surprise, people didn’t really seem to know what I was talking about, or what use-case my application idea was for.
So I decided to develop a little application for my self, and I open-sourced it. It’s my first GUI application I have written with Godot, and I must say it was a really pleasant development-experience. (Though I still have no idea how container sizing is supposed to work.)
finetune-collect is, as previously mentioned, just a little UI where you can enter your fine-tuning conversations by hand so you don’t have to write a jsonl-file by hand. So it is made specifically for the use case where you do not have an existing dataset where that you can transform into fine-tuning files via a script.
Besides the main features of saving, loading, exporting and viewing sample conversations, it supports a number of comfort-of-life-features like preventing you from exporting some flawed conversations, global system-messages, retrieving a first draft of an answer from the API and more. It supports Text, Images and Function Tool Calls.
It’s available for Windows, Linux and as a Webapp.
If you want to try it out, have feature requests or bug reports, you can check out the GitHub Repo: github wielandb/finetune-collect