Newbie Seeking Advice: Fine-Tuning GPT-3.5 Turbo for Civil Engineering Domain in Korean

Hello OpenAI Community,

I’m a newbie here and currently working on developing a Korean chatbot specifically tailored for the civil engineering domain. My goal is to fine-tune the GPT-3.5 turbo model to effectively recognize and handle specialized terminology in this field.

To achieve this, I have a bilingual glossary with 25,000 entries, containing both Korean and English translations of civil engineering terms. I am considering the best way to utilize this glossary to construct my dataset and enhance the model’s performance in recognizing these domain-specific terms.

Here are a few points I’m particularly seeking advice on:

1.Dataset Construction:
How should I structure my dataset using this glossary for the most effective fine-tuning? Should I include example sentences, or is a list of term translations sufficient?
2.Fine-Tuning Practices:
What are the best practices I should follow when fine-tuning the GPT-3.5 turbo model for this specialized domain? Are there specific parameters or techniques that are particularly effective for domain-specific language models?
3.Handling Bilingual Terms:
Given the bilingual nature of the glossary, how can I ensure the model effectively understands and translates between Korean and English civil engineering terms?
Any advice or suggestions would be greatly appreciated!

Thank you!