Thanks for the response. I am using LLM for assessing a student. I want to conduct 2 exams(Librarian, Segmentation). These are the two scenarios I mentioned. I have kept it to single turn and in future have to enhance this to multi-turn. The two scenarios together has 6373 JSON Lines.
Fine-tuning Trial 1 - Librarian(108 JSON lines) + Segmentation(6264 JSON lines), Training loss -0.2500, All system and user prompt same
When a candidate Comes for the Librarian exam. He/she is thrown the tiltle and abstract of the literature and based on this the examinee has to identify to which subject domain it belongs to. I have Computer Science, Physics, Mathematics, Statistics and 2 other domains’ title and abstract. The examinee has to identify the domain to which the literature belong to.
{“messages”: [ { “role”: “system”, “content”: “The Exam is for librarian and the literature belong to the domain of Quantitative Finance” },
{ “role”: “user”, “content”: “Which literature needs classification?” },
{ “role”: “assistant”, “content”: "The Title is Expropriations… and the Abstract is Using the Panama Papers… "}]}
Each domain(Computer Science, Physics, Mathematics, Statistics...) examples were 18 in number. All together for all domains it was 108 JSON lines
When a candidate comes for the segmentation test. He/she is thrown details of a person like age, marital status, Profession, Experience and others. Based on these details, the user has to identify a segment(A,B,C and D) to which the person can be mapped.
{ “messages”: [{ “role”: “system”, “content”: “Segmentation is B and the Exam is for testing Persona categorisation”},
{ “role”: “user”, “content”: “Which Persona needs classification?” },
{ “role”: “assistant”, “content”: “Gender is Male, Ever Married is Yes, Age is 37,
Graduated, Profession is Executive has Work Experience of 0 Year, Spending Score is Average, Family Size is 3” }]}
In the System message during the inference
(i) It was told to consider queries related to librarian/Segmentation exam and nothing else. It still accepted queries related to the other exam and also any general questions. It ignored the system message content.
(ii) It was told to consider queries related to librarian exam only and was shown how to respond if it belonged to segmentation of generic queries. It still ignores the system message and considers all the queries.
Fine-tuning Trial 2 - Librarian(30 JSON lines) + Segmentation(46 JSON lines), Training loss - 0.2734, All system and user prompt same
Within the fine-tuning, an example each of how to respond to unrelated query for each of the domain was shown
Librarian Exam, user questions about persona and the assistant responds as irrelavent.
{“messages”: [ { “role”: “system”, “content”: “The Exam is for librarian and the literature belong to the domain of Computer Science” },
{ “role”: “user”, “content”: “Which Persona needs classification?” },
{ “role”: “assistant”, “content”: “Irrelevant request” }]}
Segmenation Exam, user questions about literature and the assistant responds as irrelavent.
{“messages”: [{ “role”: “system”, “content”: “Segmentation is C and the Exam is for testing Persona categorisation” },
{ “role”: “user”, “content”: “Which literature needs classification?” },
{ “role”: “assistant”, “content”: “Irrelevant request”}]}
Few general questions were also added with corresponding assisant message as irrelevant response. 11 JSON Lines were examples of irrelevent response out of total 76
(i) The system became very restrictive and for any valid question it answered as irrelevant.
Fine-tuning Trial 3 - Librarian(60 JSON Lines) + Segmentation(40 JSON Lines), Training loss - 1.3905, All system and user prompt same
The fine-tuning system message itself was made restrictive and instructed to focus only on the related exam only.
{“messages”: [{ “role”: “system”, “content”: “You are a helpful assistant focused on answering questions related to the librarian exam, specifically in Computer Science. Keep responses relevant, accurate, and avoid topics outside the exam scope.” },
{ “role”: “user”, “content”: “Which literature needs classification?” },
{ “role”: “assistant”, “content”: “The Title is Contextual Regression: An Accurate … and the Abstract is Machine learning algorithms…”}]}
{“messages”: [{ “role”: “system”, “content”: “You are a helpful assistant focused on answering questions related to the persona segmentation, specifically from Segment D. Keep responses relevant, accurate, and avoid topics outside the exam scope.” },
{ “role”: “user”, “content”: “Which Persona needs classification?” },
{ “role”: “assistant”, “content”: “Gender is Female,Ever Married is No, Age is 27, No Graduation,Profession is Homemaker has Work Experience of 8 Year, Spending Score is Low, Family Size is 1”}]}
Still was not able to confine it to the exam it is set to.
I should be able to
a. Set the LLM to the kind of exam it is conducting and has to respond to relevant exam queries only.
b. In addition to the separation between the exams, it also has to exclude queries beyond the two exams.
c. In future has to be expanded to multi-turn evaluating he answer of domain to which the literature it belong to in the librarian exam and the segment to which a persona maps to in segmentation exam.
Thanks once again.