Sorry, I cannot answer what will “work fine” and “not work fine” with fine-tuning other’s data. Fine-tuning also requires properly formatted data including stops, separators and white space, all of which is documented in the OpenAI docs. So, the JSONL must validate both for basic JSONL syntax AND the requirements specified in the OpenAI API docs.
I’m not sure if the above is a question or not.
If I was developing a production application (such as yours?), I would create a test setup using various methods, including the following:
- Database search, simple “LIKE” SQL expressions.
- Database full-text search.
This is because these searches are actual forms of “data extractions” (you words).
GPT-models are not “extraction engines” they are text auto-completion / prediction engines. You cannot “extract” information, you can only fine-tune or train to help the model generate text as it “babbles” about “auto-completing” based on user prompts 
Normally, in many GPT models, the prediction component comes before the decoding component of the architecture, and fine-tuning occurs in the decoding phase, not the prediction phase.
Again, I as stated, from what I know of your use case (so far), using a GPT-based AI model is suboptimal for your application, and if I was developing an application component as you have described (so far) I would use a database and some text search method; especially because what you have exposed so far are “shortish” strings, and these strings are suboptimal for both training and embedding.
If you are simply experimenting and looking to compare, then consider comparing these methods:
- Database search, simple “LIKE” SQL expressions.
- Database full-text search.
- Fine-tuning GPT model(s)
- Embedding Vectors
Based on what I have read of your requirement, if you develop “well thought out” code for all the above, you will find that one of the DB approaches with text or full-text search will work the best (and cost the lest’ )
An experienced developer should be able to write an experimental test app for each of the above in a few hours each, so it’s easy to compare, which is always good if you wish to be an expert.
HTH
OBTW, I am not just “making this up” @vasundhra362000 , I have these components set up already in my “OpenAI Lab” which I run in dev on my desktop:
I have methods to validate the JSONL data as well, so it’s a good idea for developers to validate the data in their app, from both a JSONL basic validation, and an OpenAI API fine-tuning requirement (specified in the API docs).
HTH