The best way to handle data in JSONL?

Hi! I’m not actually a developer, but sometimes I want to train a model on my own data. From the documentation, I’ve learned that I need to organize my “dataset” in .jsonl format (which, as I understand, means long strings of text).

My data is quite large, and it’s very inconvenient for me to handle it in editors like VSCode or PyCharm.

  1. Are there any good methods for managing large .jsonl files?
  2. Wouldn’t it be great if platform.openai.com had an user-friendly interface where we could upload and manage this data before deciding to train a model?
1 Like

Since it’s separated by newlines you can just cut it where ever you want, and append to it without loading it into memory, which is why it’s JSONL

You can organize your training data into folders and then write a script to collect it all into a JSONL

2 Likes

Haha, ChatGPT created the entire interface for my needs in just two minutes. We are living in incredible times!
And it works properly!

1 Like