Here’s a concise summary of the user’s feedback from Gemini:
User Experience Issue with ChatGPT Music Generation
Problem: The user attempted to generate a CCM-style piano accompaniment (MP3) from an uploaded sheet music image.
GPT’s Response: ChatGPT initially indicated the task was possible, providing repeated “working on it” or “please wait” messages.
Outcome: After over two hours, the user received a silent 5-second test audio file, indicating the task was likely impossible.
Negative Impact:
Delayed the user’s work.
Wasted the user’s time and resources.
Caused uncertainty about whether to wait, give up, or find alternatives.
Proposed Improvements:
Pre-computation Assessment: A system to inform users immediately if a request is likely outside the platform’s capabilities (e.g., “This task may not be supported”).
Alternative Guidance/Cancellation: UX logic to suggest alternatives or prompt users to stop waiting for delayed or impossible tasks.
Progress Feedback: Visual or textual feedback on task success or progress to prevent prolonged, uninformed waiting.
User’s Conclusion: While impressed by GPT’s advancements, the unclear experience highlights a critical failure in user experience and trust. The user hopes this feedback leads to improvements.
ChatGPT with code interpreter can do some pretty interesting things. Using its vision skill to transcribe the notes of sheet music is going to be the first non-starter in this idea, though. It can do text well, but there’s not going to be (image) → (labeled data) → (music notes) training in the general purpose vision.
The AI telling you to wait is just a mistake. ChatGPT will always have progress indicated, and it is really just either generating language, or doing nothing.