Hi there,
We are a company that basically creates documents. These documents are written/ saved in a custom file structure created by us that is similar to xml/xsl.
And now the aim of my new project is to train ChatGPT (or other LLM in the future) to understand the documents and to be a able to write some (itx, that is how we call the files) itx code itself.
For example after a prompt of provide the itx code necessary to have a table with three columns and two rows.
Within the company we have been creating these documents for more than ten years within a own software. Thus we have millions of documents that can be provided as training data. Furthermore we can provide not only the itx, but also the exported pdf, docx, xsl-fo and html, so lots of other formats that GPT already understands for it to be able to create a connection between the itx and what the equivalent content looks like.
So to summarise my questions would be:
- How do I train it best?
- How do I prepare the data?
- And is something like this even possible
?
FYI: I have more than enough time and all the resources needed