@PeculiarPath I believe that with good prompt design and preparatory work (I like your idea), the current version of the OpenAI interface might suffice to write ‘a novel’ (I think of approaches like the snowflake method), but the lack of an over-arcing ‘log’ (as @boudinot calls it - the actual writer’s “memory”), will almost certainly result in less interesting fiction (and almost certainly not in “literature”).

1 Like

There are several methods out there for fiction writing, I like the structured approach (regardless of what method) because the groundwork of codifying the requirements has already been done for the most part.

Choose a method, turn it into an algorithm. Of course that, like with any creative work, a lot of the “formulas” out there contain a great deal of subjectivity, ambiguity and are very often vague and open to interpretation. Even with Larry Brooks’ story engineering, and story physics with titles that suggest a more exact approach, creativity is hardly an exact science, and humans cannot be beaten when it comes to that.

The processes I describe there are mere attempts to use those methods to deal with the limitations of the System (memory, planning, plotting) in order to mimic the structure present in those books.

My crude understanding of the nuts and bolts behind GPT-3 is that it is a highly trained prediction engine that calculates what is the most probable word (token actually) that will come next, base on what has come before. with the definition of “Before” being your immediate, limited, prompt, and the whole of the corpus used for the pre-training, which generated the 175B parameters that make davinci.

Predicting the next token is hardly planning ahead, let alone being able to set up things to pay off in later chapters, keep track of character traits and personality, to know how a character is likely to behave.

I keep going back to the analogy of synthesizers and drum machines. When I started using GPT-3 I started listening to Nine Inch Nails again. No one could ever accuse Trent Reznor of making music that wasn’t emotional or expressed some sort of fundamentally human spirit.

In my opinion, if you look to GPT-3 and AI in general as a means of replacing the process of writing, then you’ll end up disappointed. Writing to me is a process that can never and maybe should never be completely understood to the point that it can be reverse engineered. It exists in the same category as dreaming and sex, an experiential rather than theoretical activity. Susan Sontag in Against Interpretation said “In place of a hermeneutics we need an erotics of art.” My understanding of this is that reducing the writing process to merely an algorithm negates the very reason it exists in the first place, as something to be experienced subjectively, emotionally, viscerally.

It’s the collision of algorithms and the subjective nature of art that excites me most. I love all this thinking around how to bend the tool to produce a certain kind of output, while recognizing that it allows for the creation of entirely new art forms and genres, like electronic music, that wouldn’t exist otherwise.

1 Like

Kinda had the same idea when I saw all those articles about GPT-3 and journalists using the AI to write their paper.
While waiting to be granted permission, I tried experimenting with GPT-2. With the last version, I was able to feed my novels and short stories to the AI (because, also, I’m in Québec and I write in French, so I had no idea how it would react). I wanted to see if an AI could write with my style.
Could it be used to write a short story juste like it did with the articles ?
Short answer : no.
Long answer : hell no !
Even though I could recognise names or words from my novels, the phrases made no sense (at all!). I would basically have to rewrite everything myself, which is not the point.
GPT-3 gave me more hope for about a second. The writing is better, but it goes nowhere, says nothing, doesn’t have substance.
The thing is, part of the stories we write isn’t even on paper. It’s in between the lines.
Plus the AI works with rules. You teach it phrase or story structure, or grammar, and such, and it will reproduce that. But artist are constantly breaking the rules. Can you write a rule for the AI to break the rules and be original ?
I have to agree with @Romolo. You can get lots of words, but it won’t be litterature. Not yet.

Literature is a slippery concept- it contains a judgement, and fiction written by AI has no pretentiousness.

However, I’m more hopeful than you, @py.villeneuve! The creative powers of GPT-3 for fiction writing are insane. It can lead to literature, but not in a push-the-magic-button way. A writer is needed, a writer who understand how fiction works ànd how prompts work, a writer who controls the rodeo bull. Any non-writer will be tossed off immediately.

I believe AI-enriched fiction will be a thing in a few years to come. It will produce novels that are literary and revolutionary. They will be written by human and AI together.

I think it comes down to whether or not we consider GPT-3 as a “labor saving device.” If so, then it’s going to disappoint us by not autonomously producing works of coherent literature. But if we think of it as an instrument that requires the same amount–but a different kind–of work from the writer, then I’m with @Romolo in feeling it can spark a whole new kind of fiction.

Dali could be used to provide a graphical overlay for the novel using heat maps for comparative analysis alongside thousands of other successful books.

I am not anticipating that any one model will do all of the work of story writing. I see potential for a set of task specific fine-tuned models to help with stuff like: paraphrase this / finish the scene / or something like make an analogy of highlighted text.

I’ve found that a “reverse summarizer” works very well for expanding on a block of text, which can then be expanded on again. Now I’m fiddling with controlled generation that has something equivalent to the “log” you mentioned, basically if you create a fine tuned model that is trained on all sections of story, if there is 1 tying element that connects the sections it keeps the output coherent.

1 Like

I’m super curious about your reverse summarizer and controlled generation. Can you share any examples?

Yeah sure, there’s three of them. First was is just the reversed summarizer, other’s are the attempts at controlled generation. The third ones… Interpretive lol.

3 Likes

Hey everyone! New to the chat! On the topic of Transformers being able to write a cohesive “long term” novel, Transformers must have some sort of differentiable memory attached.

Another issue is recurrence slowing things down. Ie when you train GPTm (m for memory I made it up!), you don’t want to individually input the output of the old prediction into the next training step - ie you don’t want any correlation between training step 1 and N. In GPTx/BERT/variants, they just assume the output is right (teacher forcing).

So the issues are:

  1. How to add differentiable memory to GPT?
  2. How to do this without a recurrence relation?

I first thought of something boring (haven’t tested). You make a matrix M (called memory) of size (m, f). f is the original embedding dimension (like 768). m can be massive say 20,000. For every batch of text, you pass the MH attention layers, dense layers all the way to the final softmax layer, then somehow copy BERT’s CLS approach and “extract” the CLS 768 dim vector and perform v=M*(CLS) which will get u a tall skinny vector (20,000 by 1).

Then, perform a long tailed sigmoid ie 1/(1+exp(-0.5v)) onto v. Then element wise multiply the sigmoid output with v^T. You’ll get a (20,000 by 768) matrix the size of M.

Then M(t+1) = M + 1/(1+exp(-0.5v)) * v^T. Then append M onto X (which can be very problematic), or somehow “summarise” M (ie say via a Clarkson Woodruff Transform shrinking M(20000,768) to say (500,768). You can even train the summarisation weight matrix S so we get:

M(t+1) = M + 1/(1+exp(-0.5v)) * v^T
X(t+1) = concat[ X(t+1) , S * M(t+1) ]

The CW Transform will just “freeze” S as a hash table.

This has 2 benefits:

  1. Incorporates long term attention. Ie the dot product makes similar memories remember even more often, and discounts not so important memories.
  2. Fixes catastrophic forgetting. The use of a long tailed sigmoid allows long term memories to stay inplace and not vanish.

However there are is a clear issue with this approach:

  1. Recurrence comes back! Batches must now be sequential… Ie previously u can have 1,000,000 books scramble each page, and GPT would be fine. Now GPTm needs to train ONLY on book 10 then 103 then 12039 with page orderings intact.
2 Likes

On further thoughts:

  1. A long tailed sigmoid acctually might be bad, rather the rank one update possibly takes care of catastrophic forgetting since at the start, the CLS token updates every line of the memory matrix M equally. Rather tanh might be better.
  2. A new [MEM] token can be added in tandem with [CLS]. During Masked Self Attention, instead of using an upper triangular mask of -inf and lower tri of 0s, add an extra [MEM] token which allows the whole sentence to attend to it. Then feed the [MEM] token back for every NEW page in a novel. The [MEM] token ISNT STATIC, and every new book wipes [MEM]'s state back to all 0s.
  3. CW Transform is not needed. Rather shrink the memory matrix down to 10 rows.
  4. Recurrence isn’t that bad! Each batch row will be a separate row of random novels (novel A, G, Z, K etc). Each new training step will be a new page for every random novel (novel A1->2, G1233->1234, Z24->25, K213->214). The recurrence relation is only every page / sentence.
  5. Each novel has it’s own memory matrix M intialised as all 0s. To not cause bias to creep into the model (say a novel has 10,000 pages), you train GPTm in windows of pages, then save the per book M to disk and get a new novel. Retrieve the book’s unique M later. Likewise every new batch might reselect novel X. Delete the old memory matrix M and restart training.
1 Like

Daniel,

I will first of all say that I don’t understand most of what you’re saying here, but that isn’t a criticism. (Or rather, it’s a criticsm of myself) Thank you for thinking about this and imagining what’s possible. I appreciate it, and am excited about where this might lead.

I am coming to GPT-3 as a creative writer rather than as a software developer. I can speak to the weird creative process of writing a novel as a human. I’m hoping to articulate what I envision for an AI writing tool to the point of someone like yourself being able to engineer it.

I currently have the first 35 pages of a novel, totalling about 7,000 words. I would like to find some way to use this as a prompt.

I am currently developing a way to write short stories with GPT-3. They typically start with a one-shot prompt. I provide a paragraph or two of contextual orientation for the AI, followed by the first couple paragraphs. Once I generate a completion, I edit it to my satisfaction or sometimes delete it altogether and start again. I play with the settings along the way, particularly the temperature. I keep resubmitting the growing completion until I run out of tokens. As I near my token limit, I attempt to “land the plane” so to speak, bringing the story to what feels like a natural conclusion.

I’d basically like to engage in this same process but with bigger prompts.

Does this process sound compatible with the system you describe?

Again, thank you so much.

2 Likes

No problems sorry on my part for being vague and I didn’t explain it too much!

Ye so ur method sounds reasonable. GPT3 handles a can’t remember a window size of X tokens (2,048 tokens?) Essentially around 1,024 words since tokens aren’t words but word pieces. So whole paragraphs are possible.

Say we have GPT0 (fake super weak GPT with 4 words as the window). I start the sentence as “Hello my name is”

  1. Input [Hello], [my], [name], [is] into GPT0.
  2. GPT0 predicts [Daniel] as the next word.
  3. Re-feed GPT0 with [my], [name], [is], [Daniel] into GPT0.
  4. GPT0 predicts [and] as the the next word.
  5. Re-feed GPT0 with [name], [is], [Daniel], [and] into GPT0.
    and so on.

So yes your current method makes sense. The issue now is the window size. Clearly a window size of say infinite size is not feasible. GPT-5 could have a window size of 2^16 (65,536) tokens or something (GPT4 seems to be just more efficient GPT3). Another option is to allow any window size up to the input size. This is possible, just implementation wise for batches sounds complicated. The optimization algos also need to be edited. This can allow “infinite” sequences.

Another option is as I mentioned a Memory Matrix. Essentially instead of GPTx forgetting ur previous input, it keeps a “running summary” of ur input.

Say we have GPTm with 4 tokens and a memory of 2 sequences and we start as “Hello my name is”.

  1. Input [Hello], [my], [name], [is] AND Memory = [0],[0].
  2. GPT0 predicts [Daniel] as the next word. Update Memory = [1],[2].
  3. Re-feed GPT0 with [my], [name], [is], [Daniel] AND Memory = [1],[2].
  4. GPT0 predicts [and] as the the next word. Update Memory = [51],[22].
  5. Re-feed GPT0 with [name], [is], [Daniel], [and] AND Memory = [21],[224].
    and so on.

Now GPTm “remembers” the long term context of ur novel. It can somehow remember 10,000 sentences ago the introduction, the plot lines, the characters etc inside the memory matrix.

Ryan, if you’d like to try to use your 7000 words as a prompt, I can set you up with some of my tools.

1 Like

Hey, all these links are down. Do you have others?

You can take a look at NimbleBooks.com. DM me if you want to set up an account.

Here’s my work on this idea. It needs a lot of help but I took it as far as I’m willing to at the moment. Please feel free to steal it. I will add the MIT license to it.

1 Like

as a full-time novelist, I try to use gpt-3 to generate long-form novels in a similar way refer to by the posts, but the result is very bad, because the output of gpt-3 is must be cherry-picked, about 10% (even lower)of its output is high quality, it is time and money consuming, and only DaVinci engine can do the task, it took a lot of money, i only use the gpt-3 as brainstorm.

3 Likes

It does seem like there are so many interesting avenues here to check out.
My idea was pretty simple. Just write the novel like a “markov process”: just prompt GPT-3 to write the first 3 paragraphs, say. Then just feed it half a prompt of what it just wrote, and ask it to continue. At the top of the prompt, just have a short description about the entire concept of the novel, to try to give it some coherence.

Haven’t done it yet but might try it soon.

1 Like