Translations with term lists

luc.gosso · January 12, 2024, 10:47pm

Hi everyone,

I’m exploring the possibility of using the ChatGPT API for translation purposes and have a specific requirement. In my business, we use specialized terminology that varies across different languages.

I’m wondering if there’s a way to integrate a custom list of terms into the ChatGPT translation process. Essentially, I’d like to provide ChatGPT with a list of these terms in various languages to ensure that the translations it produces consistently use the correct business-specific language.

Has anyone done something similar or knows if this is feasible with the ChatGPT API? Any insights or suggestions on how to approach this would be greatly appreciated.

The lists are very long, about 10 000 rows… We could perhaps cut it down more but you get the picture.

Thanks in advance!

anon22939549 · January 13, 2024, 6:29am

Maybe make a custom GPT for this task and include these lists as your knowledge files, or connect it to a retrieval action to fill the context appropriately.

Diet · January 13, 2024, 6:35am

Hi! Welcome to the forums!

The most popular pattern for augmenting LLMs is called RAG - Retrieval Augmented Generation.

You basically have a vectordb (faiss would probably be a super good enough option in your case) that is populated by embedding vectors

What you’d probably do is embed definitions of all these terms, and retrieve a sublist of the most related terms to the document with a relatedness cutoff.

If you have tons and tons of examples, another thing you can potentially look at is fine-tunes. While I’m typically of the opinion that finetunes are a waste of money, it’s possible that in your case they might be worthwhile since you’re looking to emulate a specific style in your output.

I’d personally take a look at rag, you can throw a PoC together with jupyter in about an hour or so, and see where you get from there.

anon22939549 · January 13, 2024, 6:58am

I think this would be a poor use case for vector embeddings because it’s deterministic and we aren’t concerned about semantics.

What I’d do is take the text to be translated and,

Given the language of the text, scan the text to identify all the special terms present
Given the target language of the translation collect the matching pair special term
Augment the text to include the special term translation, so something like,

When dealing with special_term_english do this

Becomes,

When dealing with [special_term_english=special_term_spanish] do this

Then include in the instructions something like,

You must translate this document from English to Spanish. Included in this document are several terms-of-art which have very specific, precise translation requirements. To assist you in this effort these terms-of-art and their necessary and correct translations are delineated in the format [x=y] where x is the term-of-art in the source language and y represents the only acceptable translation of the term-of-art in the target language.

Diet · January 13, 2024, 7:03am

Ah, yeah if the source documents are consistent you can indeed just do that.

luc.gosso · January 13, 2024, 1:07pm

Thanks for the suggestions guys!

What about finetuning a model per language?

Finetuning “translate this text from English to Spanish: this is lingo” => “esta poco loco”

In theory, using “this is lingo” in a text, would then be replaced with “esta poco loco”

Finetuning with thousands of these would work no?

anon22939549 · January 13, 2024, 1:20pm

Fine-tuning could work, but… That might be a lot of fine-tuning which can get expensive, then running fine-tuned models which is more expensive, all without knowing ahead of time how it will perform.

Honestly, I recommend just building some scaffolding around the LLM to make it’s job easier. Not every task is meant for an LLM to do and even those that are not all of them need to be done without help.

Topic		Replies	Views
Need help with text translation (somewhat complex rules) API chatgpt , api , translation , assistants-api	2	46	May 5, 2025
Add user's vocabulary or reading level Prompting	8	1422	October 26, 2023
How to force GPT to use specific terms in translations Prompting translation	1	499	November 5, 2024
How to improve translation quality for specific theme? Prompting gpt-4	1	453	March 21, 2024
Custom model for domain-specific translation Community chatgpt	6	1546	September 5, 2023

Translations with term lists

Related topics