GPT API (turbo-3.5) return output truncated with ellipsis (...) when token limit is not reached

I have a case that want to process 100 keywords into a list of keyword fragment. The expected result is number of output = number of input, but the actual result only gives me 12 result and truncated the middle part using ellipsis.

Tried to ask to display whole result in the command, but it did not work. Is anyone know show to solve this issue?

749 prompt (system + input) tokens counted by the OpenAI API.
111 output tokens counted by the OpenAI API.
860 total tokens counted by the OpenAI API. 

{
  "id": "chatcmpl-7ydbh7nLisqXAYbLBT6dFeso7XDRf",
  "object": "chat.completion",
  "created": 1694686285,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "[('shoe cabinet', 'lock', ''), ('yolu', '', ''), ('sara jane', 'shoes', ''), ('nerdy', 'bag', ''), ('mirror', 'with led lights', ''), ('boric', 'acid', ''), ('Washing Machine', 'drain hose', ''), ('side table', 'slim', ''), ('wpc', 'umbrella', ''), ('chaise...), ('karaoke', 'speaker', ''), ('Backpack', 'work', ''), ('key holder', 'leather', '')]"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 749,
    "completion_tokens": 111,
    "total_tokens": 860
  }
}

The output I want is like the following format without truncated the middle part

[('penutup mata', 'tidur', ''), ('toilet', 'cabinet', ''), ('sticker', 'floor', ''), ('maggi', 'kari cup', ''), ('samsung galaxy a02', 'casing', ''), ('jubah', 'hitam', ''), ('baju raya', 'budak perempuan', ''), ('baju kurung', 'kedah', ''), ('handsock', 'cotton', ''), ('chia seed', 'cereal', ''), ('cheongsam', 'dress', ''), ('winter jacket', 'women', ''), ('cargo pants', 'women', ''), ('cake', 'birthday', 'mickey mouse'), ('vitamin c', 'gummies', ''), ('speaker bluetooth', 'bass', ''), ('tommy hilfiger', 'perfume', 'men'), ('cadbury chocolate', 'hazelnut', ''), ('honda city', 'hatchback', 'body kit'), ('xiaomi mi robot', 'vacuum mop 2 pro', ''), ('vacuum', 'tineco', ''), ('vacuum', 'tefal flask', ''), ('bracelet', 'obsidian black', ''), ('bracelet dior', 'friendship', ''), ('bracelet pandora', 'gold', ''), ('bracelet hermes', 'women', ''), ('bracelet', 'men diamond', ''), ('lightstick', 'kpop', ''), ('lightstick', 'txt', ''), ('txt', 'photocard', ''), ('water bottle hydroflask', 'straw', ''), ('water bottle', '3l', ''), ('water bottle', 'slim', ''), ('tshirt long sleeve', 'men', ''), ('baju kurung plus size', '10xl', ''), ('mesin rumput', 'mitsubishi', ''), ('massage herbal oil', '801', ''), ('tjean bbq grill', 'multifunctional', ''), ('mosquito repellent', 'essential oil', ''), ('towel rack', 'stand', ''), ('shower head', 'holder', ''), ('rak toilet', 'bathroom', ''), ('lampu hiasan', 'bilik tidur', ''), ('Nike dunk low', 'paisley patterns', ''), ('gucci', 'perfume', 'men'), ('gucci', 'perfume', 'floral'), ('gucci guilty', 'perfume', ''), ('gucci guilty', 'homme', ''), ('manga demon slayer', 'full set', ''), ('tablet dyna', 'charcoal', ''), ('hair', 'wolf cut', ''), ('handbag', 'budak perempuan', 'cute'), ('handbag', 'budak', ''), ('eyelet baby', 'cloth', ''), ('dress', 'elsa', ''), ('langsir tingkap', 'tebal', ''), ('langsir tingkap', 'pendek', ''), ('langsir tingkap', 'dawai', ''), ('langsir sliding door', '3 panel', ''), ('petronas badminton', 'jersey', ''), ('cat epoxy lantai', 'simen', ''), ('ceiling', 'lamp', ''), ('kaca mata', 'wanita', ''), ('gelas', 'kaca', ''), ('wallpaper', 'border', ''), ('wall', 'hanger', ''), ('marble wallpaper', 'white', ''), ('body lotion', 'whitening', ''), ('nescafe gold', 'ice cream', ''), ('car cover', 'outdoor protection', ''), ('tshirt polo', 'men', ''), ('mochi boba', 'nestle', ''), ('wardah', 'lipmatte', ''), ('lip', 'mask', ''), ('lip', 'liner', ''), ('3m tape', 'double sided', ''), ('massage oil', 'lidoria goodman', ''), ('hair spray', 'needs keratin', ''), ('nail polish', 'set', ''), ('sony wf 1000', 'xm4', ''), ('sony sound', 'bar', ''), ('sony ericsson', 'phone', ''), ('kettle electric', 'gooseneck', ''), ('kettle glass', 'electric', ''), ('kettle travel', 'electric', ''), ('kettle portable', 'electric', ''), ('purse charles and keith', 'women', ''), ('powerbank', 'iphone', ''), ('iphone', 'case', ''), ('iphone 13 pro max', 'case', ''), ('frozen', 'turkey', ''), ('cupcake', 'frozen', ''), ('frozen mix', 'berries', ''), ('buku teks', 'geografi', 'tingkatan 3'), ('buku percubaan', 'spm', ''), ('sliding door', 'handle', ''), ('door', 'partition', ''), ('cctv wireless', 'outdoor', ''), ('xiaomi', 'cctv', ''), ('stainless steel', 'shelf', '')]

The truncated I talked was shown in the blue box below:

First thought, you’re asking way too much of the model.

Remember, the model needs to process everything all at once, it doesn’t think like we do.

If you give the task to a human brain, it will be able to effectively chunk it and medically do one at a time until the end, we won’t be looking at every token, in both the input and output, trying to determine if it’s relevant for the next token we want to generate—our attention system is much better.

The first thing I’d recommend doing is to help your assistant out by working within the constraints of its abilities and skill set. Experiment with giving it fewer keyword sets to break up. Between five and twenty is probably the sweet spot depending on the complexity of the system prompt and length of keyword sets you’re using.

The AI may have more fluency and understand the need for unaltered data if you specify the output format as a python list of lists.

[[‘shoe cabinet’, ‘lock’, ‘’], [‘yolu’, ‘’, ‘’], [‘sara jane’, ‘shoes’, ‘’], [‘nerdy’, ‘bag’, ‘’], [‘mirror’, 'LED ', ‘illumination’], [‘boric’, ‘acid’, ‘’], [‘Washing Machine’, ‘drain’, ‘hose’], [‘side table’, ‘slim’, ‘’], [‘wpc’, ‘umbrella’, ‘’], [‘chaise’, ‘lounge’, ‘’], [‘karaoke’, ‘speaker’, ‘’], [‘Backpack’, ‘work’, ‘’], [‘key holder’, ‘leather’, ‘’]]