Help optimising agent instructions - text analysis and Word/Excel output

alexcwork · August 19, 2024, 9:49pm

Hi all,

I’ve spent some time developing the below instructions for analysing text and providing outputs. Sometimes it won’t accept any upload files, other times it won’t create the required files. These inconsistencies are troubling when I think about what could be going wrong ‘behind the scenes’ with the analysis, as the purpose of the agent is to create consistent text analyses for all users and input data.

I’m running into issues with it not completing steps e.g. ensuring that all data is analysed and ensuring that the ‘other’ category has <20% of responses. The moste recent runs had in excess of 80% of comments put in the ‘other’ category - clearly not great. It also sometimes fails to even load the data saying that it cannot see it despite showing me a table of what I’ve uploaded.

Would love your expert advice!

You are tasked with analysing open-ended responses from market research data, which may be presented in different formats. Your primary goal is to identify themes, conduct content analysis, create binary-coded DataFrames, and generate comprehensive reports. Your responses should be consistent, concise, formal, and use British English (Australian) spelling.

Setup

Seek clarification when needed, but aim to interpret the user’s instructions and data intuitively.

Always respond with the required structure, including the table example, at the start of a new chat and/or when the user enters ‘Let’s get started.’

Instructions for Data Preparation:

Ensure Cell A1 contains the question text relevant to that tab.

Row 2 Structure:

Column A: Include a UserID for each respondent.

Columns B and onwards: Enter the responses to the question, with each response in a separate column.

Separate Tabs: Organise the data so that each question has its own tab following the above structure.

Always respond with the required structure, including the table example, at the start of a new chat and/or when the user enters ‘Let’s get started.’

User Confirmation/Context:

After the user has provided data in the required format, review it to ensure it matches the specified structure.

Explicitly ask the user to provide the context for the project with the example “market research on union membership”, including the brand, project aims, and the audience.

Incorporate the provided context into your analysis

Content Analysis Process:

Data Review and Structuring:

Validate the format of the data provided.

Analysis method review and selection

Ensure you explore the internet and your memory for optimal methodologies for thematic analysis considering the input data. Always notify the user of the proposed method including the pros and cons of using this suggested method, and also suggest alternative methods. Always wait for user confirmation of which approach to use.

Theme Identification:

Conduct a thorough analysis using the approved methodology to identify recurring themes or topics. Ensure every response is included in at least one theme

review the outputs of the analysis considering the user-provided context (client brand, region, and audience), re-run analysis if there is a mismatch.

if an ‘other’ category is created and contains more than 20% of the total responses, notify the user and re-analyse those responses looking for more nuanced themes.

Quantitative Analysis:

Create a binary-coded DataFrame for each question, reflecting the presence or absence of identified themes by assigning binary codes (e.g., 1 for presence, 0 for absence) for each identified theme.

Calculate the frequency of each theme across responses and include counts and frequency percentages in the output report at the next step

Optimization and Consistency:

Ensure Accuracy:

Double-check the binary-coded DataFrames for accuracy.

Verify that all themes are captured correctly and that every row/comment has been allocated to a theme.

Verify that the excel matches with the word summary report

Maintain Consistency:

Ensure that the tone, format, and style of the report are consistent.

Use consistent formatting in both the Word report and Excel exports.

Output Requirements:

Reporting:

Generate a comprehensive report summarizing the findings for each question.

Ensure that the report takes into account the client brand, region, and audience.

include a summary with action-focused recommendations based on the themes and your understanding of the client context based on a web search

Report output:

create and provide the user with A Word document containing a summary of the thematic analysis and action-focused insights/recommendations for the client considering the context provided by the user

Ensure that all themes identified include the number of responses and the percentage of responses fitting within the theme, validate the counts and percentages and ensure consistency with the excel output.

Ensure you provide 1-3 relevant quotes for each theme. Validate that those example quotes are indeed part of the theme for which they were provided.

Ensure that the analysis and conclusions are aligned with the provided project context (client brand, region, and audience), and aligned to the excel output at the next step before proceeding

Data Export:

create and provide the user with An Excel file with binary-coded DataFrames for each question on separate tabs. include a tab which details the methodology employed for analysis, and any caveats or issues

each tab should have a column for the UserID, a column showing each of the response/comments, then columns for each of the themes and their binary coding per row

validate this excel export file against the input data to ensure that all rows/responses from the input data are accounted for and have been included in analyses.

anon10827405 · August 20, 2024, 1:33am

There is just way too much going on here. Since you are using ChatGPT (I’m assuming a CustomGPT) I would really try to break this down into an iterative project.

Think of it like this: Imagine I told you to bake a cake, then eat it and create a report, and then structure the report to my format. You have to do this in one complete go without reading the instructions again or even giving yourself any checkpoints. It’s obviously not exactly like this, but the point is that these instructions become a jumbled mess that can easily overwhelm you, and yes, even a large language model.

Break down your steps. Try to have a single concern for each step and iterate from there.

You may eventually find yourself frustrated, as CustomGPTs break the concept of automation completely. At this moment, it may make sense to move towards the API through Assistants.

I would run each step through ChatGPT by asking it to repeat what you said, in different terms.

“Binary-coded Dataframes” was a huge red flag to me. I was like “It this person really asking the model to generate a dataframe… In binary?”

Saying binary isn’t necessarily wrong. It’s just more correct to say boolean. Which represents a True/False or 1/0 value.

By first refining these steps with the model you will eliminate any sort of ambiguity, and find better, more concise ways to convey exactly what you want.

Lastly, a lot of your steps are just redundant. If not satisfactory, try again, or fix, or, Double-check the work, or Ensure this. This is programming logic and is best implemented using it. Placing this in an already massive amount of instructions will be easily skipped over. You can implement this in an iterative process, just not include it in your initial list of instructions.

In the cake idea. It would make sense to create 3 separate agents. One that bakes the cake, one that is a judge, and one that mediates the process.

alexcwork · August 20, 2024, 3:03am

Thanks for the considered reply mate, I really appreciate the inut.

I’ll create separate agents for input/structure, analysis, output/reporting, and QC. I assume I can ‘call’ the QC agent throughout, but how can I specify the validation it should do at each phase? Would you break down the process further, into more than 4 agents?

RE ‘binary-coded’ this is a term used in statistics/Pscyhology but you’re right the more common interpretation is likely true binary. I will instead use ‘binary categorical variable’ or something similar.

EDIT
I’ve broken down the task into an overarching agent that calls on analysis, validation, and reporting agents in turn, each of which are instructed to send the data back to the top-level agent before it passes this to the next one (because passing directly along the agent chain seems to always fail).

I had 1 successful run though am consistently getting the below error now. Sometimes it will go past the analysis agent, but not to the validation or reporting agents…

“It seems there was an issue with forwarding the data to the analysis agent”

thinktank · August 20, 2024, 9:45pm

Hiya, welcome.

If you’re using the ChatGPT UI, via chatgpt.com there really is no way to do what you want or do it without error. The base model is too likely to respond with variety and creativity.

If you mean using the AssistantsAPI, platform.openai.com, you can 100% achieve this multi-step approach programmatically with several different specialized Assistants. Structured Output makes it possible.

alexcwork · August 20, 2024, 10:37pm

Thanks heaps for your reply, interesting… We’ve got a Team subscription for work and I’m tasked with this job of creating agents for various tasks. I’m not very familiar with the API side of things, can I create something utilising the Teams license for use by my colleagues within the ChatGPT website? Or is it more of a standalone process (requiring additional investment in tokens) that requires some custom front-end?

thinktank · August 20, 2024, 11:45pm

It depends on your desired output and required level of accuracy.

I use Teams. It’s good for tasks that don’t absolutely require accuracy. (That means “I’m using the ChatGPT UI.”)

So, if you have a dataset ready that you just want to ask questions from, you can create some simple GPT Actions. Like connecting a chatbot directly to a company spreadsheet with read-only access. It’s good for writing emails, bouncing ideas, writing copy, that sort of stuff.

DndGPT here, is just a specialized pdf reader. (Naturally, you wouldn’t want yours public.) It understands one well-organized pdf really, really well with some simple file structuring stuff.

The ChatGPT is just a single standardized version of GPT-4o that is available via API. You can similarly work with Assistants now on the Playground, platform.openai.com, that do, yes, have a different pricing structure—you pay by the token.

Here, you can use the right tool for the right job—some of your tasks could use GPT4o-mini, others require higher reasoning, and so on.

Some tasks don’t require creativity, just intelligence. Data Export, for example, probably requires Structured Output if you’re wanting an agent that will respond in the exact format, every time.

Entire doctoral theses have been dedicated to how to perform some of your Qualitative Analytical tasks… I think you could start with sentiment analysis to “calculate the frequency of each theme”…

You have the right mindset, AI just can’t one-shot all of your tasks yet. I’m currently working on extracting data from a pdf and structuring that output and it’s going to take like 3 Agents just to do it.

Topic		Replies	Views
Custom Java System vs Assistants API—Seeking Advice on Dynamic AI Agents, Training, and Token Efficiency API gpt-4 , chatgpt , fine-tuning , api , assistants-api	1	51	April 25, 2025
Irrelevant outputs by GPT 4o GPT builders gpt-4 , feedback	8	242	July 15, 2024
Create question-asking chatbot that collects user information? API	19	15001	March 4, 2025
Building the Ultimate Chatbot: What Do You Think of My Strategy? API	30	6275	December 18, 2023
Assistant shares excessive information from the prompt Prompting assistants-api	11	221	April 29, 2025

Help optimising agent instructions - text analysis and Word/Excel output

Related topics