Prespan: Usually, I speak and write German and not English which will probably be reflected in my usage of the English language.
I tested the following formats:
In my experience, Markdown (MD) works best for structured text that also uses formal symbols such as in math equations. The - conjectured, admittedly - reason for this is due to the similarity of the data format used for the output and the availability of (structured) training data in that format.
Whenever I ask for math assistance, Markdown with embedded or MathJAx‘d equations is used to render the output on the client side.
The syntax is also used to display e. g. tables, headings in various hierarchical depths and embed graphics.
On the input side, the methods used to preprocess data is not known to me with certainty. Thus the addendum from above that the reasons are a conjecture.
The problem at the present is that there is little functionality built in the system to transform the knowledge in - say PDF - to formats that produce higher quality output in a faster way.
For the example under consideration, and small knowledge data, a service called MathPIX (free version) was employed for the conversion of PDF with math expressions to MD and then manually dumping the result of the preprocessing step outside the AI system in the relevant input fields worked best but - to be blunt - feels clumsy and inconvenient.
The idea originates from another problem that still awaits its solution, namely the handling of multiple PDFs. This worked without system failure only with a low probability in ~ 30 tries. With markdown as the only input format, the problem was absent in the case of knowledge distributed among multiple files.
To sum up, I distilled the following „guidelines“ for my purposes:
- Use Markdown in the multiple file case.
- Use Markdown in the case of a single file that contains elements in languages other than the natural one.
Ideally, a conversion would be possible directly in the system or an integration to outside services is possible. Furthermore, I am a bit skeptical that the product fulfills the data protection criteria because - at the present - it is possible to download knowledge using prompts or the code interpreter, if activated.
Best wishes
David