I’d like to offer some structured feedback on my experience using GPT-4, particularly as it relates to output density, formatting, and model behavior in response to instruction customization.
1. Output Density and Token Efficiency
In my regular usage, I often find that a significant proportion of GPT-4’s output can be safely removed without affecting the semantic or operational value of the response. While this may not have a noticeable financial impact for users on the Plus plan (where the Web interface feels functionally flat-rate), it raises concerns about the efficiency of output for those paying per token. For them, verbose and unsolicited content likely leads to higher costs without a corresponding increase in value.
2. Unsolicited Follow-Up Prompts
Nearly every completion includes some variant of an offer to extend the task—e.g., “Would you like me to add X?” or “Can I help with Y?” While I understand this is intended to be helpful, in practice, these suggestions often don’t reflect the actual context or needs of the interaction. This is particularly problematic in code debugging workflows, where extraneous output pollutes the sequence, and I suspect can contribute to bugs, due to confusing the model. Even custom GPTs configured with instructions to avoid such behavior often continue to include it.
3. Emotive Language and Anthropomorphic Framing
Despite configuring some of my custom GPTs to maintain a strictly functional tone, I’ve observed persistent use of emojis and anthropomorphic expressions. While one of the models did respect the instruction, another has not. This suggests that default output behavior is still heavily biased toward affective communication styles, even when explicitly discouraged.
For users who rely on precision, clarity, and machine-like detachment in technical or analytical workflows, the presence of emotional or human-like phrasing can reduce both usability and trust in the tool’s alignment with user instructions.
4. Markdown vs. Literal Plaintext Output
Another issue concerns Markdown formatting. When requesting a literal, plaintext version of a prior output (e.g., an earlier version of a file), the model initially defaulted to Markdown-segmented formatting. Only after a secondary prompt did it comply with the request for strictly literal output. This default behavior can cause token waste and confusion, especially in scenarios involving code or data fidelity where formatting artifacts are undesirable.
5. Broader Implication: Instruction Adherence and Output Customization
Taken together, these behaviors suggest a systemic bias toward verbosity, anthropomorphism, and suggestion generation, which can be counterproductive in workflows prioritizing compact, directive-aligned, and non-suggestive output. I’d encourage OpenAI to consider ways to improve the precision and instruction adherence of GPT-4, particularly in the context of:
- Token economy for pay-per-token users
- Deterministic formatting (especially when formatting instructions are given)
- Optional opt-out of conversational or emotive features
Enhancing control over these elements would improve the model’s utility for technical and task-focused users who rely on clarity, conciseness, and strict instruction-following.
Thank you for considering this feedback.