Messages I/O Growing, Now What?

I consider myself quite novice, but I’ve been using the API for many months now and made some progress with text generation.

Context: I use open-ai’s text generation to write introductory paragraphs based off a fairly small body of text content. Imagine somebody gives me the basic content of a 500-word article that was missing an introduction, and I am using ChatGPT API to generate the introductory paragraph. I’m doing stuff like that.

I have found out through many iterations of use that I have to limit my question scope or desired shaping of the AI response when sending input. In other words, I cannot get my ideal response unless I break up the input/output messaging into smaller concerns.

Can I assume this is normal and the conclusion most AI API users arrive at?

As a result there is a lot of round tripping to the API going on. For instance, based on my desire for an introductory paragraph above I will:

  1. Build list of blacklisted words and create tokens array for use with logit_bias (used throughout messages)
  2. Set system tone and expectations
  3. Send 500-word article and Ask for an outline
  4. Ask for Introductory paragraph based on outline
  5. Ask for grammatical fixes
  6. Ask for SEO optimization
  7. Ask for tone fixes

The above order of operations is a skeletal representation. There’s a lot more going on with each ask of the API and some cleaning up on my part.

The results I have been getting have been getting significantly better as I have broken things up into multiple messages; each one attempting to solve a smaller part of my overall goal.

The only downside is the growing latency. This is only for my personal use, but still…if I add a few more input/output cycles I can see the wait time getting increasingly slower, exceeding 1+ minute or so.
Am I approaching this the right way?
As a beginner with some experience; now what?
What is the next logical step or direction to take my learning/project in?

Thank you kindly.

3 Likes

Hi @bionary !

I would say you have the right approach. My finding is also that if I want more reliable (and better quality output), I should chunk up the problem into specific sets of inputs/outputs, rather than having everything in one go.

And theoretically this is well grounded, since too much information all at once means really losing some of those tokens (logit values getting way too small to be sampled meaningfully).

Also, for creative writing, I usually have to sometimes do multiple-turns for each one of those components. So I think you are on the right track!

2 Likes

That’s encouraging @platypus thanks!

Based on reading docs and other sources it seems like there are so many features that I haven’t even tried. “Functions”, “stored Assistants”, etc…

It’s difficult to make sense of these newer advanced features. Are any of these features (plus others I didn’t mention) worth looking into to solve my text-generation goals?

Honestly, I got to this point and have been thinking…there’s got to be more! Smarter ways to do what I’m doing… Ways to get better output…etc.

When devs build these text-writing wrappers is my technique mentioned above all they are doing? There’s got to be more to it!

2 Likes

In terms of the quality of text generation output, that’s basically the best baseline you can get. Focusing on a single concern in that one call, with the right prompt, will get you very far.

So as I mentioned, the only time I’ve seen measurable improvement, is with multiple turns. E.g. let’s say you are trying to set the optimal tone - you may want to have 2-3 passes on this before selecting the best response. This of course, adds latency, because you can’t parallelize it, i.e. you want to improve the previous response by stuffing it into the context in the subsequent call.

Along those same lines, you can explore “agentic workflows”. Essentially you define a bunch of agents - one is generating an outline, the other is writing the intro, another is fixing the gramma, etc. And then you have an agent, or even better, a “panel of agents” that act as an editor, and basically perform some kind of a voting on the final output. If they are not happy with it, they output the reason, and to which agent it should be sent for improvement (e.g. maybe the “grammar” agent needs to tighten things up). This whole workflow is repeated until the “editorial agent(s)” accept the output.

All of this adds latency and cost, but it’s basically the next level in advancing your work.

2 Likes

Thank you again @platypus . Your insight lead me to more learning!

I suppose I should mention that my code needs to run autonomously; as such, each turn (prompt/response) builds upon the previous with no human interaction.

The “agentic workflows” looks ideal! But, the more I read the more I’m slightly confused. “Agent” is not in OpenAI Docs as a feature.

From what I can tell, the term “Agent” seems to mean any AI source that is specialized in solving a narrow set of tasks. Does this sound correct?

Since I am using only OpenAI could you provide a bit more context as to where to define these “agents” and how to reference them?

Again, I’m trying to stick to just OpenAI API right now.

1 Like

No problem, yes it’s perhaps somewhat confusing.

And yes you can remain completely within the OpenAI ecosystem. An “agent” is just a prompt with a very specific/targeted instruction. It can have more bells and whistles, like being able to access/query other data sources, but it doesn’t have to.

In terms of the OpenAI environment, it can be Assistants API, but it can also just be a bunch of standard ChatCompletions API calls, where each call is wrapped in a function and does a very specific thing.

To give you a super simple example - here is two-agent system, where one agent produces an introduction paragraph, and another agent “scores” it. The cycle ends once a number of turns has been reached or once the scoring agent gives the thumb up.

And note that all of this can be executed sequentially - so you can run this in a single Jupyter notebook. You can make it as complex as you like.

def intro_writing_agent(outline_text: str) -> str:
    """Given an outline of a document, generate and return an intro paragraph."""
    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are an expert at producing short introductory paragraphs, given an outline of a document."},
            {"role": "user", "content": outline_text}
        ]
    )
    return completion.choices[0].message


def intro_scoring_agent(intro_text: str) -> bool:
    """Given an introductory paragraph, return True or False if the paragraph is good enough for final inclusion."""
    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are an expert editorial system that decides whether an introductory paragraph is good for final inclusion. Return True or False."},
            {"role": "user", "content": inro_text}
        ]
    )
    # ... perform some post-processing of the output to get True or False
    # ....
    return final_decision

MAX_NUM_TURNS = 5
num_turns = 0

while num_turns < MAX_NUM_TURNS:
    intro_text = intro_writing_agent(outline)
    if intro_scoring_agent(intro_text):
        break
    num_turns += 1

# you now have your final introductory paragraph

This is a super naive example, but gives you an idea.

Definitely check out the structured outputs you linked. And also look at how multi-agent system can be better formalized using this: GitHub - openai/swarm: Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

2 Likes

@bionary - one other quick point. You might also want to give fine-tuning a try. It caters well to use cases like yours that involve generating output in a specific style and tone. It might enable you to consolidate a few of a the steps into a single one; plus it would also decrease latency.

2 Likes

Unfortunately I code in PHP only, but I can still understand your Python code you provided.

Based on the code your provided it looks like you are separating the chain of concerns into separate chat instances. Otherwise it looks very similar to what I am doing already. It’s as if each agent is nothing more than it’s own chat instance, rather than part of the the ongoing chat/messages I am currently using.

I suppose this “agentic” way allows you to contain and shape the GPT settings + role:system per concern. My current way can only have GPT settings set once at the start and then I rely on my role:user messages to shape the outputs, making sure I carefully build the chat/messages array and send them back each time. I was afraid to “lose context”, but I suppose there is no context needed if each agent has a simple set of rules and objectives.

Furthermore, I suppose that each “agent” can have it’s own back/forth (array of messages) like I am currently doing to arrive at results. And depending on the requirements, I can see that this agentic workflow might also allow for some parallel processing. hmmm.

Again, I am grateful for your advice and examples @platypus

Can you elaborate on this?
As of now my calls contain a text list of instructions similar to this:

- Please write an introduction for the article I will provide.
- You present to the reader the overall concepts without getting too detailed.
- No Markdown
- No emojis
- No HTML
- Paragraphs: 2-4
- Here is the article:

$article

I’d like to know more about the functions you mentioned!

Thanks

Thank you for your input @jr.2509

I looked into fine-tuning a few months back but got scared away when the docs claimed that sending well engineered prompts was recommended first. Also, I’m looking for a lot more creativity in the writing.

One of the problems I encountered early on was actually with providing examples of writing. I found that the results almost became too formulaic when I provided examples. This became obvious after looking at hundreds of outputs across a couple of closely related topics.

Admittedly I was much less experienced when I first tried using writing examples; so maybe it’s a technique worth revisiting.

1 Like

Hi! The functions are basically along the lines of the code that I wrote, not much more complicated than that. But this was super naive example, the idea was to show that you can have this bouncing back and forth between two agents until some condition is met.

Ideally, the “scoring” agent would output some score AND a reason why it scored it low (if that is the case). You would supply this reason back to whatever original agent, in the prompt, and then take the output and send it back to the scoring agent, and keep repeating until the score was high, or a stopping condition was met. Now you can do this for all aspects of text generation.

So it’s just automating what you already started. And going back to my original statement - I think you are definitely on the right track. I leave it as an exercise for you to expand on this :wink:. And really looking forward to seeing what you will create!

2 Likes

This conversation has been amazingly encouraging. I’ve learned a bunch of new terms and concepts. Thank you again @platypus
Now I have much more learning to do!

2 Likes