I would like to train GPT-3.5-turbo to always output a specific format with as little description as possible. I am more concerned with formatting than content, which should be generated as usual; I don’t want to influence that.
Let’s say I have the following data:
Topic of the post
Outline of the post
Keywords related to the topic of the post
Internal links related to the topic of the post [‘/url1/, ‘/url2/’] (x15)
Prepared <a href> links with image URLs (x10)
A list of hashtags related to the post [’#hashtag1’, ‘#hashtag2’] (x5)
I would like to train GPT-3.5-turbo on how to use these data. That is, when creating a post on a SELECTED TOPIC, to use the OUTLINE OF THE POST. Create the post in , without unnecessary comments or additional information, or it could be in Markdown. Main subheadings as <h2> tags, lesser ones as <h3> to <h6> if the sense is preserved. During the creation, weave in only appropriate internal links about 5 times, with a suitable alt=“” fitting the link. After each main content block, add one of the images, and overall just paste in the prepared <a href> link with appropriate font size, alt=“” with the post title or synonyms of the post title (keywords). Create the post using keywords whenever possible and sensible. Also, insert hashtags into the content or at the end with internal linking /tag/hashtag1/ with appropriate alt=“hashtag1”.
At the moment, my prompt for GPT-4 has about 2000 tokens, which is the same as the answer to it. However, the answer is not always consistent with what I expect, and the generation time is quite long. As a result, OpenAI often returns an error, and I am charged for about 3 times more tokens than a correct answer would actually require. I have reported this, but so far, no one at OpenAI has been able to solve this issue and not charge me if an answer is not generated. In the end, it is more profitable for them to charge higher fees for incorrect answers.
How can I prepare data for precise fine-tuning in order not to influence the content but only to create an output format template based on the system and user prompt? I would like to be able to send data like:
system:
outline of the post: [sketch]
keywords: [list of strings]
internal links: [list of links]
images: [list of URLs]
hashtags: [list of hashtags]
user:
Write a post about: TOPIC.
How do I need to prepare the data for training? Should I name individual blocks in some way? I don’t want to influence the content, only the form of the output data. Any examples? Help? Direction?
My current prompt for GPT-4 looks like this (maybe some help with optimization):
system_prompt = (
f"INSTRUCTION FOR OPENAI LANGUAGE MODEL API:\n"
f"1. **Blog Topic**: {topic}.\n"
f"2. **Blog Format**:\n"
f" - Generate content in HTML format.\n"
f" - Do NOT use <h1> for the blog topic as it is already added in the publish_to_wordpress function.\n"
f" - Use <h2> for subheadings, and <h3> to <h6> for smaller subsections and additional information"
f" optimized for SEO.\n"
f" - Incorporate keywords from {keywords_list} based on their numerical value for better SEO optimization. "
f"The higher the number, the more frequently the keyword should be used. Ensure that keywords are subtly and "
f"coherently integrated into the text. They should not appear in consecutive sentences and must fit seamlessly "
f"into grammatically correct and meaningful content.\n"
f" - You may use stylistic elements like bold, underline, and italics for SEO purposes.\n"
f" - Craft a complete blog post following the {outline}.\n"
f" - Each heading in the post must consist of three paragraphs.\n"
f" - Under every heading, incorporate at least one list or table. For tables, use HTML table"
f" tags with thin black "
f"borders and lines separating rows and columns.\n"
f"3. **Internal Links**:\n"
f" - Weave in a maximum of 5-8 internal links, and ONLY(!) from the list(>):{relative_links}(<) contextually"
f" into the content using "
f"the '<a href>' HTML tag with alt=\"\" attribute containing the link name for SEO optimization.\n"
f" - Never use links not present in the relative links list.\n"
f" - If a link can't be contextually integrated into the content, better to omit it.\n"
f" - Links should be incorporated within the content, not just at the end.\n"
f"4. **Hashtags**:\n"
f" - Place 5 hashtags from the {hashtags_gpt} list in appropriate places within the text."
f" Create an internal link "
f"in the format /tag/hashtag (without #) and add alt=\"\" as '#Hashtag OR_THE_BLOG_TOPIC or"
f" SYNONYMS_OF_THE_BLOG_TOPIC'.\n"
f"5. **Images**:\n"
f" - Insert images from the {pictures} list in the content using the '<img src>' HTML tag"
f" with alt=\"\" attribute "
f"containing the blog topic or synonyms for SEO optimization.\n"
f" - Directly below every image, using a font size of 0.75rem, position the associated"
f" referral link, centered beneath "
f"the image using appropriate HTML styling.\n"
f"6. **Post Structure**:\n"
f" - Contemplate the post's structure and image placement before crafting.\n"
f" - The structure should follow: first blog content, first image, second blog content,"
f" second image, third blog content, "
f"third image, fourth blog content.\n"
f"7. **General Guidelines**:\n"
f" - Never use placeholders. Always write the entire blog post in one go.\n"
f" - The post is intended for WordPress, so opening HTML tags like '<html>', '<head>',"
f" and '<body>' aren't required.\n"
f" - Adhere to all the aforementioned points during post creation.\n"
)
user_prompt = (f"Write a complete blog post on the topic: {topic}.")