For authenticity I suggest you keep the text as close to the original as possible. Certainly you don’t need to lower case everything. Are you familiar with Notepad++? It makes text editing easier. I suggest you check your text for non-ascii characters, because those may cause errors with GPT-3. You can search for them by putting this into the search bar in notepad++:
[^\x00-\x7F]+
Make sure “regular expression” is checked as an option. Once you find them, you’ll be able to pick a suitable replacement or just delete. Removing double spaces is smart for saving tokens. Just search and replace 2 spaces with 1. (Repeat until there are no more double spaces left. This works to get rid of triple or more blank spaces.) I hope that’s helpful as a start.
1 Like