Dear, all
I’m currently using the latest GPT-4 model API to fine-tune my script, which is about 15,000 tokens long. However, I’m not sure if the API allows me to directly upload a text file like the chatgpt plus interface, so I’m using text input.
The issue I’ve encountered is that the API output only contains 4,095 tokens, and I suspect that the API didn’t fully process my entire script.
To confirm my suspicion, I added a prompt asking it to calculate the number of lines and words in my input script, and the response indeed confirmed that some content was missing.The API report indicates a total of 159 rows of sentences with a combined word count of 4016 words. However, the actual dialogue script consists of 423 rows of sentences with approximately 9615 words.
Could you please advise me on how to handle this issue?
Do I need to change the text format or configure specific GPT parameters?
My current code is quite simple shown below
#the input script
with open('sample2.txt', 'r', encoding='utf-8') as file:
input_text = file.read()
encoding = tiktoken.encoding_for_model("gpt-4-1106-preview") # gpt-3.5-turbo gpt-4 gpt-4-1106-preview
token_count = len(encoding.encode(input_text))
print(f"The text contains {token_count} tokens.") # about 15000 tokens
system_prompt = "As a professional screenwriter, please improve the following script:\n\n" + script_content
response = openai.ChatCompletion.create(
model="gpt-4-1106-preview", #gpt-3.5-turbo-16k gpt-4-1106-preview gpt-3.5-turbo-1106
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": input_text}
]
)
gpt_response = response['choices'][0]['message']['content']
with open('gpt_response.txt', 'w', encoding='utf-8') as file:
file.write(gpt_response)