I started doing some investigation this morning for fun, the rabbit hole gets deep quick.
This is all in the playground, before starting on the scripting tool, just to gather requirements for the code.
BACK COVER BLURB
I started with generating a Back-Cover-Blurb for a book, with a prompt and without a prompt. this step was easy and works either way. (davinci-instruct-beta was really good at the task).
FIRST PAGE
From the back cover blurb, I started on the first page. I could have simply started from scratch, but I wanted to see how much I could stick to what was in the blurb (this will be important for keeping track of the book as a whole without being prescriptive of what happens.
The first page differs from the following pages as it doesn’t have anything other than the back-blurb, to start from, so I started with
BACK-COVER-BLURB: +[whatever I generated previously]+CHAPTER 1:
as prompt, using davinci, returning 400 tokens to get about a page of a book.
If the page ends in a period. I assume it’s the end of a paragraph. and store that entire page.
If the page ends in an incomplete paragraph, I trim it and store to use as part of the prompt of the next page.
SUMMARIZING THE PAGE (this is a function that will get repeated with every page generation)
The next step was to Summarize the first page output. This needs to be done with a lower temperature, I tried instruct, but the best results I got was using davinci with a prompt like this:
After reading the following text, the teacher asked the student to summarize it.
- [PAGE CONTENTS] +
The student provided the following summary:
SUBSEQUENT PAGES
My plan here is to loop through this step 200 times, this will most likely not lead to any recognizable narrative arc, but it will spit out 200 pages.
- The prompt will contains:
- The back-cover-blub
- The recent events summary
- The last complete OR incomplete paragraph (as start)
FINDINGS
I feel like I will have to run grammar correction steps after every generation as I have noticed that the engine can start dropping the ball if your text contains grammatical or spelling mistakes, it will incorporate it into its styling.
If your frequency penalty is high the engine will start penalizing punctuation from occurring in the later parts of the text. I don’t quite know how to stop that from happening other than keeping the generations short.
Is there a mechanism where I can exclude tokens from being penalized? I think I’m going to open a thread on this.