Hi everyone,
I am very new to coding, so apologies if this seems very misguided and bad. Any advice here I would be grateful for. I have looked around and found various copy editing tools, but nothing quite like what I’m trying to create. I want to make a website where the user can submit a .txt file of basically any length (maybe up to a 500,000 words, as there has to be a cap somewhere!) and get back a .txt file that has been copy edited.
I’ve been using the openai.Completion.create
method. I tried the openai.Edit.create
version and found it wasn’t very strong–missed a lot of easy mistakes such as “he” instead of “the” and whatnot.
I started by breaking up the submitted .txt files by paragraphs and making API calls with 1000 word chunks, then writing the results straight into a new .txt file. This worked pretty well, but if someone happens to submit a file that has gigantic paragraphs, then a submitted chunk of text will exceed the token limit. So now I’m trying to break text down by character count, but I think this is screwing up my results, because now GPT keeps wanting to add content to the text instead of make simple corrections. I may try using the format shown in the grammar example but I just wanted to go ahead and ask for feedback before working on that. Is there a simpler way to get back results? I could not quite figure out how to submit an entire file to the API then ask for results, so I’m trying this method for now.
def run_editor(key):
#clear the edited.txt file
edited_text = open("uploads/edited.txt", "w", encoding='utf-8', errors="ignore")
edited_text.close()
#reopen with "append"
edited_text = open("uploads/edited.txt", "a", encoding='utf-8', errors="ignore")
#open the submitted file
with open("uploads/original.txt", "r", encoding='utf-8', errors="ignore") as f:
original_text = f.read()
f.close
paragraph_text = original_text.split("\n")
submit_text = ""
#rebuild text into one string. Ensures paragraphs are formatted. The .split(" ") method will eliminate "\n" characters.
for paragraph in paragraph_text:
submit_text += paragraph
submit_text += "\n\n"
#grab first 4000 characters in submit_text and copy them into submit_chunk.
#deletes the characters that were copied and finishes when submit_text is empty.
while submit_text:
adjust = 0
submit_chunk = ""
if len(submit_text) > 4000: #avoids out-of-bound error when at the end
while submit_text[3999 + adjust] != " ": #make sure not to end mid-word
adjust += 1
submit_chunk += submit_text[:4000 + adjust]
submit_text = submit_text[4000 + adjust:]
edited_text.write(openai_api(key, submit_chunk))
edited_text.close()
return
def openai_api(key, submitted_text):
openai.api_key = key #passed in from HTML page, or wherever
prompt = "Act like a copyeditor and proofreader and edit this manuscript according to the Chicago Manual of Style. Focus on punctuation, grammar, syntax, typos, capitalization, formatting and consistency. Format all numbers according to the Chicago Manual of Style, spelling them out if necessary. Use italics and smart quotes. Ignore errors of fragmented sentences. Do not complete the end of the text. Begin here:\n\n"
prompt += submitted_text
chatgpt_response = openai.Completion.create(
model="text-davinci-003",
prompt=prompt,
temperature= 0.2,
max_tokens=2000, top_p=1,
frequency_penalty=0,
presence_penalty=0)['choices'][0]['text'] #grab string part of response only
return chatgpt_response