Creating Fine-Tune Model from PDF Data in Node.js: Need Advice and Recommendations

Hey everyone! :wave:

I’m currently working on a project where I have a book in PDF format and I’m aiming to create a fine-tune model for GPT directly from that PDF using Node.js. The fine-tune model expects data in JSONL format.

I’m seeking advice and recommendations from the community on the best approach for converting and using PDF data for fine-tuning in Node.js. Specifically, I’m interested in insights on data structuring, tools for extracting text from PDFs, and any specific libraries or techniques that have proven helpful in similar projects.

Your experiences and expertise are highly valued! Thanks in advance for your insights and recommendations!