Hi everyone,
I’m currently building a fanfiction translation app (Chinese → English) that uses the OpenAI API to both translate chapters and dynamically expand a glossary for term consistency.
I’m encountering issues specifically with the glossary extraction step, and I’d love some help from the community to understand if this is a model limitation, a prompt design flaw, or something else.
What I’m trying to do
After translating a chapter, I prompt the model to extract new glossary entries from the Chinese text that meet strict inclusion rules. Only specific categories like personal names, clans, techniques, ranks, etc., should be added if they are:
- Not already in
glossary_input
- Found with exact matches in the English translation (
text_en
) - Assigned a gender: “male”, “female”, or “neutral”
I even apply a “compound name rule” for names like “千手柱間” → output 3 entries: “千手”, “柱間”, and the full name.
What’s going wrong
When using gpt-4o-mini
, the results often include:
- Terms that should be excluded, like “查克拉” (Chakra), which is too generic.
- Partial or missing application of the name-splitting rule.
- Some over-inclusions like “火影” or “Forest of Death”, which are arguable, but I’d still prefer stricter filtering.
- Sometimes a misunderstanding of whether terms were already present in
glossary_input
.
You can see an example below (excerpted for clarity):
json
CopierModifier
"glossary_input": [
{"vo": "和風", "translated": "Kazekaze", "genre": "male"},
...
],
"new_glossary": [
{"vo": "查克拉", "translated": "Chakra", "genre": "neutral"},
{"vo": "火影", "translated": "Hokage", "genre": "neutral"},
...
]
What I’m trying to figure out
- Model choice: Is
gpt-4o-mini
capable of this level of control and nuanced filtering? Or should I stick togpt-4o
despite higher cost? - Prompting: Am I doing something wrong with my
system
prompt design? Should I break it down more explicitly or reformat it? - Data flow: Could the issue come from token limits or how I structure the inputs (long
text_cn
/text_en
blocks)? - Other suggestions: Are there known strategies to improve term filtering/selection behavior for glossary-building tasks like this?
Extra context
- I’m using the
chat.completions.create()
API withresponse_format="json_object"
to ensure structured output. - Temperature is low (0.05), and I’ve tried adjusting top_p and penalties to no significant effect.
- Here my prompt :
You are building the **new_glossary** array for a Chinese→{language_name} fan‑fiction chapter.
──────────────────────────────────────────────────────────────
✅ **Include a term** *only if* all conditions are met:
1. The term is **NOT** already present in `glossary_input`.
2. It belongs to **one** of these categories:
• Personal given names & family names (e.g. 宇智波斑)
• Clan / family names (e.g. 千手)
• Techniques / jutsu / special powers (e.g. 豪火球之術)
• Ranks, titles, organizations, summons, unique items or locations.
3. You can find the exact English wording **inside `text_en`**.
4. Genre must only be **male**, **female** or **neutral**.
🚫 **Do NOT add**:
• Everyday nouns/verbs/adjectives (水、火、眼睛…)
• Grammatical particles, measure words, conjunctions.
• Onomatopoeia or sound effects.
• Ads, platform promos or meta‑text.
🪄 **Compound personal names rule**
For any multi‑character personal name, output **three** entries:
{{"vo":"宇智波","translated":"Uchiha","genre":"neutral"}},
{{"vo":"斑","translated":"Madara","genre":"male"}}
{{"vo":"宇智波斑","translated":"Uchiha Madara","genre":"male"}}
📄 **Output format (JSON only)**
{"new_glossary":[ {{"vo":"…","translated":"…","genre":"…"}}, … ] }
If no term qualifies, return: {"new_glossary": []}
I’d be super grateful for any thoughts, advice, or even examples from anyone who’s done something similar — either in translation QA workflows or glossary generation!
Thanks in advance