Recently, I started a new open-source project called SongGPT that explores the potential of Language Models in generating original and customizable musical compositions. However, I wanted to ask people who actually have an understanding of how LLMs work. Is this even a good idea? Or is it a better approach to have “domain” specific models in that case since text-based models will never have a true understanding of “music”?
Basically no. LLMs aren’t trained on audio. Roughly speaking, the only way it could ‘learn’ how to write music is by ‘reading’ sources where the notes are listed. E.g it reads an article saying ‘play Fur Elise with these notes: E D# E D#’ etc. etc.
In the future, AI models will begin to incorporate audio, and then you might be able try it. But it’s a long way off for now. Save your project for 5 years time
This is only using a simple system message before calling the ChatGPT API. I can imagine if someone puts a bit more effort here it could be something. No?