Trying parallel function calling—I get an illogical response

I’m trying to be very clear and straightforward. And avoid two API calls to translate from the same source text. I’ve tried a few different ways of phrasing this:

    {"role": "system", "content": "You are a helpful translation assistant, specializing in technical documentation, designed to output JSON."},
    {"role": "system", "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous."},

    {"role": "user",   "content": "1. You will translate a technical markdown article from English into French."},
    {"role": "user",   "content": "2. Then you will save the French version."},

    {"role": "user",   "content": "3. Then you will translate the English markdown article into German."},
    {"role": "user",   "content": "4. Finally, you will save the German version too."},

    {"role": "user",   "content": "This is the English source text:"},
    {"role": "user",   "content": just_the_en_text}

My function is save_translated_article() and gpt-4-turbo does fine translating into one language per API call.

Here’s the odd, contradictory response:

To proceed with the translation process, I need to first translate the article into French
and then save it. After that, I will translate it into German and save that version as well.

Please provide me with the French and German translations so I can assist you further in saving the translated markdown articles.


Has anyone been successful asking for two pieces of work and getting parallel function calls like in the sample code?

Why not just send these API calls out, retrieve the response and save it directly yourself?

But make it easy for the LLM, just one translation per call.

Also, you can get parallelism from these synchronous API calls by pumping all the data into something that auto scales out, and can run multiple calls at the same time across different instances. (eg AWS Lambda)

Just limit the calls to not exceed your API quotas. (basic “sleep” statement here in the loop)

1 Like

I’m trying to avoid huge input token fees.

Passing in the English Article uses 770 prompt tokens.

I want to translate it into 15 languages without resending the English text over and over.

Here’s another odd response after re-phrasing the prompt:

Understood. However, I cannot directly translate text. Please provide me with the
translated French and German texts, and I will proceed to call the save_translated_article() function with the respective translations.

Very strange. It does fine translating text when only asked to do one language at a time.

I might seem strange. But all the instructions are adding noise to the model, and reducing the SNR.

So to get higher SNR (better reliability) you need to go slower, one at a time, or small clusters of languages at a time.

The more complex your instructions, the higher probability of failure.

It’s a trade (cost vs. quality).


Ah, I actually got two translated texts with this re-writing—I had a hunch.

    {"role": "user",   "content": "This is the English source text:"},
    {"role": "user",   "content": just_the_en_text},

    {"role": "user",   "content": "I need separate function calls for each save action:"},

    {"role": "user",   "content": "You will translate the source text into French."},
    {"role": "user",   "content": "You will translate the source text into German."},

    {"role": "user",   "content": "Call save_translated_article() with the French result."},
    {"role": "user",   "content": "Call save_translated_article() again, this time with the German version."},

The token usage for this double call:

(completion_tokens=1855, prompt_tokens=841, total_tokens=2696)

For English->German alone:

(completion_tokens=838, prompt_tokens=770, total_tokens=1608)

For English->French alone:

(completion_tokens=1018, prompt_tokens=770, total_tokens=1788)

That’s a savings of 700 tokens—if the quality hasn’t suffered, as you pointed out might happen.

So now I’m going to test scaling this up to 3 or 4 translations, staying below the 4096 limit.

1 Like

OK good. :+1:

Like I said, either one at a time, or small clusters of languages at a time (2-3 languages max per call).

The small clusters improves the SNR and therefore reliability.

One at a time is the best SNR … the more you go up, the lower the SNR, but it still may be high enough to work reliably.

1 Like


As suggested before- I’d try to simplify this.
Is there a specific reason you’re using multiple messages, or function calling at all?
I’d try putting all the instructions in one system message, and simply ask for the translations in separate json attributes, then parse and call these myself. In my experience it often works better.
*It’s possible that I’m missing some reasoning behind it- sorry in advance if I do.

Here’s a sample run that I did with those minor edits to your original prompt:

1 Like

The first and immediate flaw is all the user messages.

  • The AI will treat only the last user message with priority for following instructions. Others will be seen more as past instructions or documentation.


  • there is no reason to complicate the output by attempting to use function-calling the wrong way. The AI calls a function when it thinks the function can provide utility, not being trained as function as an output method.

  • (then worse, parallel tool call models have flaws with accented and unicode characters as AI tool outputs)

Silly AI task that will not work on more than 50 tokens, because of the massive amplification of tokens x15 other languages as desired, along with the higher token usage of languages other than English:

That’s 417 output tokens. You can see the last two are repeating nonsense even if you can’t read them. The latest AI models will become very reluctant to complete the task as it is approaching over 800 tokens. I of course advised to that effect in one of the several other topics started about this task.


That’s very simple, thank you.

1 Like