I am new to fine tuning , I am fine tuning davinci model for plagiarism check ,trained with python code .But my doubt is if i trained it with python ,will it be able to find the plagiarism check for other languages like Java ,kotlin ,C
Or since it is only trained with python it only works with python?
Or is there any way i can make my training data of python to be used for any other language rather than i convert the python training data to some other language like Java
As a Prolog programmer and one who has used ChatGPT and other LLMs to generate Prolog code I can say that do to the limited amount of Prolog code for training and for specific problems such as the 8 queens problem that the code is almost always traceable back to a single source. I am quite familiar with the the 8 queens problem and Prolog, see my SO answer.
That does not mean the code in the training came from that page, but that that variation of the code is most likely what the the completion will contain.
However for some Prolog code that is widely distributed without attribution it is hard to identify a person that is the original author.
So try that and see what you application does. Even if it identifies the generated code as copied from somewhere, does it identify the name of the creator or only similar code. If if does not identify the name of the creator then does it meet the definition of plagiarism?
Note: This is info for you and not for others to start a debate or discussion. I don’t plan to engage in a debate over this with others, just trying to help you answer your question.