I just completed a new feature to my web app (I won’t share the link due to policy against self-promotion) which lets people write entire books with AI, and using a technique I discussed here a few weeks ago, but finally did this.
I wanted to get feed back and ideas from everyone on how to improve the concept, or who has done this themselves, if they want to discuss.
Here’s how it works: First we build a prompt like: “I’m creating a book on Python geared towards Java Developers learning Python. Make the book have 20 chapters, and give me just the JSON structured like this: ${JSON}”.
I give the format of JSON that my app can import, so it can automagically create a tree structure for Book, Chapter, Section. In other words the LLM will reply back basically with a good Table of Contents (or outline), for the book.
Next my app lets the user go to any “node” on this tree and auto-generate the content by submitting another prompt that requests for the specific Chapter and Section content to be created. It works great. The mechanics of passing back the Chapter and Section, and all the other prompting, is automatic and the user experience is just that they can ask for sections of the book.
I haven’t yet experimented with making a prompt that contains all the paragraphs of a given “section” and asks for a good “next paragraph”, but that might work too, as an additional feature. This is such a powerful technique, the entire concept seems like it could even be a feature of LangChain for example.
The screenshot below is the System Prompt the app ultimately uses for each new ‘paragraph’ it generates, with of course, as stated, the full context (chapter, section, etc) being provided in the actual prompts:
I spent five hours last night making one and I woke up and you also made one ha. (It was more like 10 minutes for the code and 4 hours and 50 minutes for the UI.) Feel free to share the link here. I am also interested in what you gave it as context, did you give it the entire current book and tell it to write the next chapter, which is possibly just money eating?
My approach is not that fancy really, because like you said it’s the GUI that’s really got to be right to make it truly useful. My prompt just sends something like this:
Book Tile: ${book}
Chapter Title: ${chapter}
Section Title: ${section}
${moreInstructions}
And the System Prompt takes care of informing the AI about the overall purpose, content, target audience, etc. so that any prompt that comes in is interpreted as someone adding content to the book.
I don’t try to generate all in one shot. The user has to make a button click to generate each piece of individual content.
Interesting. I had better results using many agents of this type of tasks. Like using 3: writer, editor and the expert, in your case a senior python/java dev.
So the expert will brainstorm what’s important for the chapter, the writer will put it nicely, easy to read and understand, and the editor will finetune the final result to make it coherent with the rest of the chapters and the overall tone the book should have.
Making it Agentic is a great idea! There’s no doubt that would give better quality final output. I should probably also make it capable of automatically generating an entire first pass at the entire book too (which is trivial to do).
Also, it would be nice if the contributions from the Agents were able to be easily compared visually in some kind of Diff Viewer, where the changes are highlighted in colored text as well. I bet Microsoft cloud editors will have this soon, if not already, and this entire stack of capability really belongs in LangChain if you ask me.
Can I ask a question (and it’s not designed to be rude, so apologies in advance). Are humans interested in reading AI books? Is it possible to create something with value in this way?
I’m working with AI and getting it to output useful content can be a challenge to say the least. Just wondering if this is something real people will read?
The way I designed my solution allows the human author to create as much of the book as they want including creating the initial book outline (Table of Content, hierarchy), so humans can use it just for inspiration and suggestions if they want, or use the AI’s output unedited.
Also it’s not really a “book”, it’s just a hierarchical document. I only use the terms book, chapter, section, subsection because humans are familiar with that model of organization. The entire purpose of my app is to do “hierarchical documents” better than anyone else ever has.
I like your story builder project, and your product designs and site! I just read thru it all.
I think probably for factual writing (like a book on “How to learn Python”) the raw output can be great, without human editing, but obviously this doesn’t apply to more fictional writing (like novels of course!). Also what I love about AI-generated writing is you can for example say “Write a book for Children to Learn Python”, and give an age level for the reading audience.
EDIT:
However for writing novels I can see how you could give the AI a very detailed knowledge about each character in a book, and the situation they’re in, and what you want to happen in any “scene” of the book, and then see how it would write the narrative dialog. That’s a very powerful concept.
Yes the AI handles factual writing much better than fictional writing. The problem with fictional writing is that the AI writing is coherent but not compelling. Meaning, it is ‘logical’ but it lacks any sort of ‘emotional feeling’. A non-fiction book wouldn’t need that.
In my experience ChatGPT has been able to create absolutely amazing fiction writing. Just say “in the style of Hemingway” and you’ll get something that brings tears to your eyes, even if you have elevated levels of testosterone. lol.
I once requested for ChatGPT to write a sad poem that was simply about a flower in a vase that was sad because it’s life was over, but it could still bring joy to people’s lives for a few more days, yadda yadda. It wrote every bit as good as Hemingway, at least for that short poem. Then I fed the entire poem into the ChatGPT image generator too and it showed the perfect image of a sad looking flower in a vase. This stuff is miraculous.
The issue for me has been “long form” fiction content creation. Basically, anything more than 1000 words. If you have any good examples (and how you did it) I would love to see them
That’s why you have to put it in chunks, I feed it the previous chapters as context and I tell it to write the next chapter, you have to tell it how many chapters are going to be made in total so it doesn’t solve the issue in chapter one.
However, with an entire novel I have found that the AI can create helpful content but there is no way to keep the AI on track with story elements such as my main character having his arm in a cast. So I created another free open source program: https://AIStoryBuilders.com to track these sorts of things to ensure the AI doesn’t keep creating content that makes no sense (the character keeps doing things that he can’t do with a cast on) and that I have to keep editing out.
In addition, I discovered I have to keep track of “Timelines”. The AI has no-sense-of-time. This is where I can see the AI is wayyyyy of from true AGI.
I spent months getting AIStoryBuilders working correctly…
Have you tried something like giving the “Book Plot” as an outline of all the major things that ever happen in the book, and including that in the System Prompt? Then as it generates, give it the context of what section of the book it’s writing for.
Also you can probably write fiction books/novels by simply describing what will happen between the characters in any given scene and letting it write “in the style of Tom Clancy” for example for the narrative. I’ve never checked to see if OpenAI can generate narratives between book characters. Probably it can.
But if you’re saying it keeps making people do things you can’t do in an arm cast for example, that’s the kind of thing where you can maybe use a two pass Agentic approach, where you tell the agent to just look for logical inconsistencies like that before the content is accepted.
I tried those things and could not get that to work. The AI just competes the next word. It cannot think at all. No ‘smart prompt’ would overcome this inherent limitation.
What I ended up doing in AIStoryBuilders was to use embeddings and RAG to feed the AI all the relevant information for each paragraph it was working on.
When doing inference, the LLM is sort of following a “single path” (excluding the temperature aspect where it randomly chooses a branch on that single path), and so just like the human mind cannot hold two thoughts simultaneously, neither can the LLM. This is why Agentic stuff is such a hot topic right now. You have to make each LLM inference do one specific thing. When you have an urge to ride your bicycle, but then you remember you can’t because of a cast, that remembering of the cast is similar to a ‘second inference pass on top of the first’. So yeah anyone trying to do everything all in one go will fail. That’s not how LLMs, nor brains work.
You can have an Agent to check for grammar, one to check for logical inconsistencies, etc. Each Agent is following an “inference path” of it’s own, and that inference path is by definition far different from the path that created the initial content.
I’m also playing with this. I’ve extended two VS Code extension to do so. One use bookmarks that look like @something and @out To preserve or cut some text from the prompt. I even have a @summarize(tagname, ratio) to summarize some parts (sent to GPT automatically behind the scene when something changes in the source, tag or ratio).
The other extension, its kind of chat, but that uses the output to create the prompt.
Following much discussion with GPT, we first created an outline, various themes, files for various aspects and characters of the story (Which I gave it, background and motivation, as requested to make Based on the people I want ) and finally some scenes list. Then, following GPT’s advice, we created various dialogue samples.
I still have to start the real writing, because I’m still not satisfied with, shall we say, the central plot line ?
But I have perhaps 200,000 tokens of GPT generated stuff around the book.
It’s not a simple process. Scenes have to be rewritten sometimes numerous times before it gets it right.
But I can say GPT seems pretty happy with the set of things we have right now.
(if anybody is interested, just speak to me and I can share the extension files - I did not yet publish them)
I tried this when I wrote The Framework I, but I quickly found out it just repeats itself in many places due to the fact that you’re generating (“zooming into”) chapters separately.
I even wrote a very detailed architecture for such a software, but it’s on-hold now. I also share your struggle with UI taking more than the algo itself, which is partially the reason I couldn’t afford working on this (I like to iterate really fast).
I think until we’ll get a gigantic-context LLM, writing books will still be a struggle.
But for AI-assisted writing, I agree, LLMs are amazing.
This is not to discourage you. Software like this is happening, it just takes a few more years (months?)