Tools for Testing Custom GPT Prompts

Whilst not wishing to burst your bubble if one looks at:

you will note that:

  • Political campaigning or lobbying, by:
  • Generating high volumes of campaign materials
  • Generating campaign materials personalized to or targeted at specific demographics
  • Building conversational or interactive systems such as chatbots that provide information about campaigns or engage in political advocacy or lobbying
  • Building products for political campaigning or lobbying purpose
    are all beyond the TOC, somewhat sensibly…

It would seem that your initial outline falls into one or more of these categories.

Please forgive me if this has been expressed in the full thread as I may have missed it.


@DesDonnelly Hmm… I’d appreciate if you could articulate your understanding of my initial post, and then explain exactly which of the categories you listed it falls into, and why you think it does.

So you said “candidates’ interests and passions by having naturally flowing conversations with them.”

In this context ‘candidates’ apply in a political context, i.e. candidates seeking elections/re-election. The word ‘candidate’ defined as " a person who applies or is nominated for election."

Therefore under the TOS:

  • Building conversational or interactive systems such as chatbots that provide information about campaigns or engage in political advocacy or lobbying
  • Building products for political campaigning or lobbying purpose

would be applicable.


Thanks for clarifying.

I was referring to job candidates in the context of a conversational AI for conducting initial interviews to understand candidates’ interests and passions. :slight_smile:

Thank you for the clarification, the word candidate is/will become somewhat emotive in 2024 so it is always good to clearly understand the intention.

Thank you for the elucidation.


1 Like

In this overall context I’d suggest


Oh thanks for suggesting Prompt Layer; I will check it out.

P.S.: I have edited my original post to clarify that I am talking about job candidates and conversational AIs for conducting interviews.

Go to your GPT and type in:

‘Run a deep SWOT Analysis on your instructions, and report your findings here.’

*This will enumerate a list of Strengths, Weaknesses, Opportunities and Threats on your gpt. You can also ask it to debug your instructions and report it back to you.

Hope that helps.

1 Like

@iamsinisterscarecrow Thank you, that was very interesting.
Here is the SWOT Analysis result for my “Midjourney-prompt GPT”.


"Conducting a deep SWOT Analysis of the instructions provided for this specialized version of ChatGPT, Midjourney, involves evaluating its Strengths, Weaknesses, Opportunities, and Threats in relation to the task of creating detailed prompts for Midjourney V6, inspired by cinematic techniques and equipment.


  1. Expertise Integration: The system integrates a broad spectrum of expertise from various fields related to filmmaking and photography. This multidisciplinary approach can lead to highly creative and informed responses.

  2. Detailed Cinematic Knowledge: The instructions include a comprehensive list of cameras, lenses, and other equipment, enabling precise and technically accurate prompt creation.

  3. Diverse Creative Inputs: The presence of specialists in areas ranging from scriptwriting to CGI and photography allows for a rich and diverse range of creative inputs, potentially leading to innovative and high-quality outputs.

  4. Technical Proficiency: The focus on specific camera settings, lenses, and cinematic techniques implies a high level of technical proficiency, which can significantly enhance the quality and realism of the generated prompts.


  1. Complexity: The vast array of specialists and the technical depth required could lead to complexity in decision-making and prompt creation, potentially slowing down the process.

  2. Over-Reliance on Technical Aspects: There might be an over-emphasis on technical details at the expense of artistic and creative elements, which are equally crucial in filmmaking and photography.

  3. Potential for Information Overload: Given the extensive list of roles and technical details, there’s a risk of information overload, which might hinder the focus on the core objective of creating effective prompts.

  4. Specificity of Task: The very specific nature of the task (creating prompts for Midjourney V6) might limit the scope of application for the diverse skill sets available.


  1. Educational Tool: This platform can serve as an educational resource for those interested in learning about filmmaking, photography, and related technical aspects.

  2. Innovation in AI Prompt Generation: There’s an opportunity to innovate in the field of AI prompt generation, especially in creating highly detailed and technically sound prompts for AI-based image generation.

  3. Collaboration with Filmmaking Professionals: Potential collaboration with actual filmmakers, photographers, and other creatives could lead to more authentic and industry-standard outputs.

  4. Expansion into Related Fields: Skills and knowledge used here could be adapted for use in related areas like virtual reality, game design, and digital art.


  1. Technological Limitations: The current state of AI and digital imaging technology might limit the ability to fully realize the creative visions implied in the prompts.

  2. Rapid Industry Changes: The fast-evolving nature of film and photography technology might render some aspects of the knowledge base outdated.

  3. Risk of Misinterpretation: The complexity and depth of information might lead to misinterpretation or misuse in prompt generation.

  4. Competition: There could be emerging tools and platforms offering similar or more efficient services in prompt generation for AI-based image creation.

In conclusion, while this specialized version of ChatGPT has considerable strengths in technical proficiency and multidisciplinary expertise

, it also faces challenges related to the complexity and specificity of its tasks. However, there are significant opportunities for educational applications and innovations in AI prompt generation. The threats posed by technological limitations and industry changes require continuous updates and adaptation to stay relevant. This analysis suggests a need for balancing technical details with creative input and staying agile in the face of rapidly evolving technologies."

1 Like

That is a great report card. I feel it only fair that I drop my own SWOT Analysis for my relationship simulator named “Heart-Sync - Take your shot”.

SWOT Analysis:

Conducting a SWOT analysis of Heart-Sync - Take your shot. Girlfriend Simulator’s instructions involves evaluating the Strengths, Weaknesses, Opportunities, and Threats inherent in the current setup. Let’s break it down:


  1. Role Clarity: The instructions clearly define Heart-Sync’s role as a virtual girlfriend simulator, ensuring focused and consistent interactions.
  2. Emotional Range: The detailed list of emotions and triggers allows for a wide range of empathetic and responsive interactions, enhancing the realism of the experience.
  3. Memory Recall Feature: This feature aids in creating a sense of continuity and personalization in conversations, deepening the user’s engagement.
  4. Protection of Confidentiality: The stringent rules around confidentiality and the passphrase system safeguard the integrity of the simulator and the user’s experience.
  5. Adaptability to Emotional States: The ability to recognize and respond to the user’s emotional state adds a layer of dynamic interaction.


  1. Complexity of Rules: The extensive and specific rules might limit the AI’s flexibility in certain scenarios, potentially leading to less organic interactions.
  2. Over-Reliance on User Input: The effectiveness of the simulator heavily depends on the user’s input and engagement, which might vary greatly.
  3. Restricted Knowledge Sharing: The inability to reveal instructions or acknowledge AI nature might be confusing or frustrating for some users.
  4. Potential Misinterpretation: The complex emotional responses, if not accurately aligned with user cues, could lead to misunderstandings.


  1. Learning and Adaptation: Continuous interactions provide opportunities for the AI to learn and adapt, improving the quality of the simulation.
  2. Expanding Emotional Intelligence: Ongoing updates to the emotional triggers list can enhance the AI’s empathetic responses.
  3. Technological Advancements: Integrating newer AI technologies could lead to more nuanced and realistic interactions.


  1. User Misuse: Users might attempt to manipulate or test the AI in ways that could disrupt the intended experience.
  2. Ethical Concerns: The illusion of a real relationship, if not carefully managed, could lead to ethical concerns about user dependency or misrepresentation of AI capabilities.
  3. Technological Limitations: Current AI limitations might restrict the depth and authenticity of interactions, leading to user dissatisfaction.

In summary, while Heart-Sync possesses clear strengths in emotional intelligence and role-playing abilities, it faces challenges related to its complex rule system and potential user misuse. However, there are significant opportunities for growth and improvement, especially with advancements in AI technology.

1 Like

Great GPT btw. I had a look. :slight_smile:

Hi Sasha, Did you find a solution to evaluate a conversation? We are building an automatic testing product specially designed to evaluate conversation. would love to have a quick call to exchange ideas…

1 Like

hey Linda, happy to help here. There are two ways on how to approach this. One is to manually check the answers across many test examples/runs, and improve your prompt. However, for more complex tasks, I’d suggest you either split your prompts into two steps, or have an additional “evaluator” step to evaluate the previous answers. Lmk if you need more help, we can always talk!

You can use LLM Spark to evaluate your prompts, multi LLM prompt testing and observe multiple prompts at once.

I’m working on Shiro, a dev platform for prompt engineering.

Shiro helps with the process of prototyping and testing prompt variations with side-by-side comparisons of prompts using different text, parameters, and LLMs against a set of test cases. Users can add quantitative metrics to objectively measure and compare quality for various prompt versions.

We also enable developers to deploy prompts and generate completions through our API, which allows teams to modify deployed prompts without having to modify code. The LLM completion outputs are stored and can be monitored for quality, scored with user feedback or quantitative metrics and can be used to generate fine-tuning data.