I just wanted to share my very first project using OpenAI APIs, very modest mainly due to my beginner level in programming.
My (4) children are fond of bedtime stories but they have always prefered “original” made up stories rather than stories from a book. Together we have built some recurring characters, the favorite being Fifi the Girafe and Rhino the Rhinoceros.
So when I discovered chatGPT, I tried it out and build a few “Fifi and Rhino” stories with the application with surprisingly entertaining results. As a side effect it also sparked the interest of my eldest into Computer Science, programming and AI (yay !!).
I decided to “upgrade” on this opportunity so I set up an API key and a Angular App (Typescript) to generate only those types of stories on my own website, with the ability to save and edit them (proof reading and context inference) and also to add some illustrations using Dall-E.
Basically the app has :
A “collection” page where you can read all the previous stories (public part)
A “story” page where you fill out a form (place, purpose, companion, temperature, model, illustrated or not) and generate a text completion with Davinci and an image completion with Dall-E (private for $$ reasons)
An “admin page” where I can edit and save the stories, and also upload training data from the growing collection to build custom models. (private)
If interested, you can review the stories (in FRENCH sorry…) at https://japansio.info/fifi/ and tell me what you think !
My next challenges involve :
getting better images that takes into account the result of the text completion rather than just the prompt (because Davinci adds a lot to the story that is not always found on the image)
getting better results on the story by reducing the amount of repetition or wasted topics around concepts that is already known from the users
Optimise prompt size by training a model to build the stories only with the 3 ingredients from the form as prompt instead of the current prompt built from those ingredients by the app.
Thank you for your attention, would love to hear your thoughts.
I’m bookmarking this thread in case you post updates here!
Have you considered using Stable Diffusion for image gen? Not that DALL-E is super expensive, but I figure if it’s open to the public, it could be a concern
The audio is calculated once from the admin page so feel free to listen to the stories in the public page since you will only play back the saved version of the audio rendition and not trigger a new audio generation.
Looking forward to expanding AI Stories with more features as I explore the world of Machine Learning
Very cool!
A while ago, I created an app that read free, creative commons books, to children. Libris free children books.
Part of the idea is that it highlights words as it speaks them helping children associate the printed and the spoken words.
It’s also an Angular app
I realized that I never made the code public on github. Let me know if you’re interested, and I’ll do that.
I didn’t quite manage to get as good results as you did in your app because there are constant mismatches between the speechmarks and the generated text in french (inconsistency in white space before punctuation marks, multiple line returns, skipped words in the speechmarks etc.) but I’m happy enough for now with the result
I should probably try lower the reading rate though… Both for pedagogic and reliability reasons
And big big Kudos to the OPENAI Community for the encouragement and inspiration so far with my pet project.
Ah, that makes sense. I never tested my app in French.
Yes, I found that lowering the speed to around .75 works well. I also use the Ivy voice which is a “child” voice, but looks like ils n’ont pas ca on francais . Sorry, American keyboard.
Hi elsiokun
I’m Matteo. I built several multilanguage Alexa skills based on the davinci-003 language model. You can see them in another openai community thread. Why don’t you try to move all the stories into Amazon Alexa? It is a device that lets you have an interaction model based on human voice. It is well spread around the globe and easy to interact. It would be possible to build characters and make them ‘alive’ in the virtual world. I think it woul be a good extension to what you have done. Let me know if you want support.
Just a quick note to say that I have successfully migrated AI Stories to the GPT-3.5 turbo model and the chat endpoint.
Using the “system” message of the chat messages, I was able to influence the content of stories to add humor and/or suspense, soon dynamically from the form (forge a system prompt from a series of form questions to dose humor/surprise etc.)
Next step will be to use the chat to summarize the produced story to serve as Dall-E prompt.
Thank you for the encouragement and advice. I used this “visually striking keywords” approach and the results are much better for the illustrations. and with GPT-3.5 Turbo being priced at a tenth of Davinci, it becomes viable to have a second chat completion (to get the summary) for each generated story and get much better images.
Did you manage to have decent results in terms of quality with Polly?
I am using it, but their TTS (even neural) looks too robotic to me. Not sure if there are any better solutions around with more native human voices possibilities
Hello @Denis1 , So far I’m happy with the french voice from Polly. It still stumbles on tow things :
Pronunciaiton of some specific words in french (can’t blame it for that )
Tone when there is dialogs such as
“Hello”, she said smiling, “How are you today ?”
But overall it does a good enough job for my purposes.
I heard that the Google Cloud TTS is supposedly better but it was too much implementation work for me (Oauth and such…) but you should definttely check it out : https://cloud.google.com/text-to-speech
But maybe we can hope that OpenAI will someday release the opposite of Whisper ?