Gpt-3.5-turbo fine tuning help needed, very difficult situations

I would like to train GPT-3.5-turbo to always output a specific format with as little description as possible. I am more concerned with formatting than content, which should be generated as usual; I don’t want to influence that.

Let’s say I have the following data:

Topic of the post
Outline of the post
Keywords related to the topic of the post
Internal links related to the topic of the post [‘/url1/, ‘/url2/’] (x15)
Prepared <a href> links with image URLs (x10)
A list of hashtags related to the post [’#hashtag1’, ‘#hashtag2’] (x5)
I would like to train GPT-3.5-turbo on how to use these data. That is, when creating a post on a SELECTED TOPIC, to use the OUTLINE OF THE POST. Create the post in , without unnecessary comments or additional information, or it could be in Markdown. Main subheadings as <h2> tags, lesser ones as <h3> to <h6> if the sense is preserved. During the creation, weave in only appropriate internal links about 5 times, with a suitable alt=“” fitting the link. After each main content block, add one of the images, and overall just paste in the prepared <a href> link with appropriate font size, alt=“” with the post title or synonyms of the post title (keywords). Create the post using keywords whenever possible and sensible. Also, insert hashtags into the content or at the end with internal linking /tag/hashtag1/ with appropriate alt=“hashtag1”.

At the moment, my prompt for GPT-4 has about 2000 tokens, which is the same as the answer to it. However, the answer is not always consistent with what I expect, and the generation time is quite long. As a result, OpenAI often returns an error, and I am charged for about 3 times more tokens than a correct answer would actually require. I have reported this, but so far, no one at OpenAI has been able to solve this issue and not charge me if an answer is not generated. In the end, it is more profitable for them to charge higher fees for incorrect answers.

How can I prepare data for precise fine-tuning in order not to influence the content but only to create an output format template based on the system and user prompt? I would like to be able to send data like:
system:
outline of the post: [sketch]
keywords: [list of strings]
internal links: [list of links]
images: [list of URLs]
hashtags: [list of hashtags]
user:
Write a post about: TOPIC.

How do I need to prepare the data for training? Should I name individual blocks in some way? I don’t want to influence the content, only the form of the output data. Any examples? Help? Direction?

My current prompt for GPT-4 looks like this (maybe some help with optimization):

    system_prompt = (
        f"INSTRUCTION FOR OPENAI LANGUAGE MODEL API:\n"
        f"1. **Blog Topic**: {topic}.\n"
        f"2. **Blog Format**:\n"
        f"   - Generate content in HTML format.\n"
        f"   - Do NOT use <h1> for the blog topic as it is already added in the publish_to_wordpress function.\n"
        f"   - Use <h2> for subheadings, and <h3> to <h6> for smaller subsections and additional information"
        f" optimized for SEO.\n"
        f"   - Incorporate keywords from {keywords_list} based on their numerical value for better SEO optimization. "
        f"The higher the number, the more frequently the keyword should be used. Ensure that keywords are subtly and "
        f"coherently integrated into the text. They should not appear in consecutive sentences and must fit seamlessly "
        f"into grammatically correct and meaningful content.\n"
        f"   - You may use stylistic elements like bold, underline, and italics for SEO purposes.\n"
        f"   - Craft a complete blog post following the {outline}.\n"
        f"   - Each heading in the post must consist of three paragraphs.\n"
        f"   - Under every heading, incorporate at least one list or table. For tables, use HTML table"
        f" tags with thin black "
        f"borders and lines separating rows and columns.\n"
        f"3. **Internal Links**:\n"
        f"   - Weave in a maximum of 5-8 internal links, and ONLY(!) from the list(>):{relative_links}(<) contextually"
        f" into the content using "
        f"the '<a href>' HTML tag with alt=\"\" attribute containing the link name for SEO optimization.\n"
        f"   - Never use links not present in the relative links list.\n"
        f"   - If a link can't be contextually integrated into the content, better to omit it.\n"
        f"   - Links should be incorporated within the content, not just at the end.\n"
        f"4. **Hashtags**:\n"
        f"   - Place 5 hashtags from the {hashtags_gpt} list in appropriate places within the text."
        f" Create an internal link "
        f"in the format /tag/hashtag (without #) and add alt=\"\" as '#Hashtag OR_THE_BLOG_TOPIC or"
        f" SYNONYMS_OF_THE_BLOG_TOPIC'.\n"
        f"5. **Images**:\n"
        f"   - Insert images from the {pictures} list in the content using the '<img src>' HTML tag"
        f" with alt=\"\" attribute "
        f"containing the blog topic or synonyms for SEO optimization.\n"
        f"   - Directly below every image, using a font size of 0.75rem, position the associated"
        f" referral link, centered beneath "
        f"the image using appropriate HTML styling.\n"
        f"6. **Post Structure**:\n"
        f"   - Contemplate the post's structure and image placement before crafting.\n"
        f"   - The structure should follow: first blog content, first image, second blog content,"
        f" second image, third blog content, "
        f"third image, fourth blog content.\n"
        f"7. **General Guidelines**:\n"
        f"   - Never use placeholders. Always write the entire blog post in one go.\n"
        f"   - The post is intended for WordPress, so opening HTML tags like '<html>', '<head>',"
        f" and '<body>' aren't required.\n"
        f"   - Adhere to all the aforementioned points during post creation.\n"
    )

    user_prompt = (f"Write a complete blog post on the topic: {topic}.")

Hi and welcome to the Developer Forum!

You seem to be expecting the model to act like traditional software, in so much as you need a strict output format with a selective input. LLM’s can follow a template for output, i.e. show it an example of a formatted output and it will try to follow that, but fine tuning will always influence the way the model behaves, you can train it on lots of examples of various input prompts and then a specifically formatted output and it will learn that format, but it will also learn the style of the answers, so I’m not sure you can isolate that side of things.

It looks like you would be best off splitting this task into a number of AI API calls used in conjunction with traditional software to build up the output into a formatted whole from isolated input elements, you could for examples make one API call to isolate specific lists of items and then one to extract the topic, one to process keywords etc., requiring that the model try to do this in a single prompt is going to yield less than ideal results.

2 Likes

Thanks for your reply I totally agree with you, I even did a fitting of what you wrote, splitting the post plan into subsections and generating everything separately, then combining, etc. It makes sense, however, I wouldn’t be myself without looking for the ultimate possibilities of today’s LLM. So I created code that generates a random post without content, but with the form I need.
I’ll try to fine tune it and see what comes out of it.

Example:

<h2>1. Major Section 1</h2>
<h3>A. Subsection A</h3>
<p>Content <a href='/tag/appropriate_for_content_hashtag67' alt='appropriate_for_content_hashtag67'>#appropriate_for_content_hashtag67</a> appropriate_for_content_keyword02...</p>
<h3>B. Subsection B</h3>
<p>Content Content appropriate_for_content_keyword67 Content...</p>
<h3>C. Subsection C</h3>
<p><a href='/tag/appropriate_for_content_hashtag10' alt='appropriate_for_content_hashtag10'>#appropriate_for_content_hashtag10</a> appropriate_for_content_keyword76 <a href='/internal-link59/' alt='appropriate_for_content_internal_link'>appropriate_for_content_internal_link</a> Content...</p>
<h3>D. Subsection D</h3>
<p>Content Content Content Content appropriate_for_content_keyword86...</p>
<center><img src='url: "https://urlcom/photo-81"' alt='SYNONYMS_OF_THE_BLOG_TOPIC'/></center>
<center><p style='font-size: 0.75rem;'>Photo by <a href=\"https://url.com/@61?utm_medium=referral\">Name61</a> on <a href=\"https://url.com/?utm_medium=referral\">Url</a></p></center>
<h2>2. Major Section 2</h2>
<h3>A. Subsection A</h3>
<p>Content <a href='/internal-link67/' alt='appropriate_for_content_internal_link'>appropriate_for_content_internal_link</a> appropriate_for_content_keyword60 Content <a href='/tag/appropriate_for_content_hashtag66' alt='appropriate_for_content_hashtag66'>#appropriate_for_content_hashtag66</a>...</p>
<h3>B. Subsection B</h3>
<p>Content Content Content appropriate_for_content_keyword16...</p>
<h3>C. Subsection C</h3>
<p>appropriate_for_content_keyword68 Content Content Content Content...</p>
<center><img src='url: "https://urlcom/photo-16"' alt='SYNONYMS_OF_THE_BLOG_TOPIC'/></center>
<center><p style='font-size: 0.75rem;'>Photo by <a href=\"https://url.com/@37?utm_medium=referral\">Name37</a> on <a href=\"https://url.com/?utm_medium=referral\">Url</a></p></center>
<h2>3. Major Section 3</h2>
<h3>A. Subsection A</h3>
<p>Content <a href='/internal-link84/' alt='appropriate_for_content_internal_link'>appropriate_for_content_internal_link</a> appropriate_for_content_keyword69...</p>
<h3>B. Subsection B</h3>
<p><a href='/internal-link69/' alt='appropriate_for_content_internal_link'>appropriate_for_content_internal_link</a> appropriate_for_content_keyword47 <a href='/tag/appropriate_for_content_hashtag30' alt='appropriate_for_content_hashtag30'>#appropriate_for_content_hashtag30</a>...</p>
<center><img src='url: "https://urlcom/photo-88"' alt='SYNONYMS_OF_THE_BLOG_TOPIC'/></center>
<center><p style='font-size: 0.75rem;'>Photo by <a href=\"https://url.com/@67?utm_medium=referral\">Name67</a> on <a href=\"https://url.com/?utm_medium=referral\">Url</a></p></center>
<h2>4. Major Section 4</h2>
<h3>A. Subsection A</h3>
<p>Content Content Content Content appropriate_for_content_keyword66 Content...</p>
<h3>B. Subsection B</h3>
<p>Content Content appropriate_for_content_keyword14 Content Content Content...</p>
<h3>C. Subsection C</h3>
<p>Content Content Content <a href='/tag/appropriate_for_content_hashtag69' alt='appropriate_for_content_hashtag69'>#appropriate_for_content_hashtag69</a> appropriate_for_content_keyword03 Content...</p>
<h3>D. Subsection D</h3>
<p><a href='/internal-link15/' alt='appropriate_for_content_internal_link'>appropriate_for_content_internal_link</a> Content Content appropriate_for_content_keyword22 Content Content...</p>
<center><img src='url: "https://urlcom/photo-75"' alt='SYNONYMS_OF_THE_BLOG_TOPIC'/></center>
<center><p style='font-size: 0.75rem;'>Photo by <a href=\"https://url.com/@63?utm_medium=referral\">Name63</a> on <a href=\"https://url.com/?utm_medium=referral\">Url</a></p></center>
<h2>5. Major Section 5</h2>
<h3>A. Subsection A</h3>
<p><a href='/tag/appropriate_for_content_hashtag70' alt='appropriate_for_content_hashtag70'>#appropriate_for_content_hashtag70</a> <a href='/internal-link97/' alt='appropriate_for_content_internal_link'>appropriate_for_content_internal_link</a> appropriate_for_content_keyword90...</p>
<h3>B. Subsection B</h3>
<p><a href='/internal-link57/' alt='appropriate_for_content_internal_link'>appropriate_for_content_internal_link</a> appropriate_for_content_keyword35 Content Content Content...</p>
<center><img src='url: "https://urlcom/photo-70"' alt='SYNONYMS_OF_THE_BLOG_TOPIC'/></center>
<center><p style='font-size: 0.75rem;'>Photo by <a href=\"https://url.com/@34?utm_medium=referral\">Name34</a> on <a href=\"https://url.com/?utm_medium=referral\">Url</a></p></center>
<h2>6. Major Section 6</h2>
<h3>A. Subsection A</h3>
<p>Content appropriate_for_content_keyword72 Content...</p>
<h3>B. Subsection B</h3>
<p>Content Content <a href='/internal-link81/' alt='appropriate_for_content_internal_link'>appropriate_for_content_internal_link</a> appropriate_for_content_keyword59...</p>
<h3>C. Subsection C</h3>
<p>Content appropriate_for_content_keyword10 Content Content Content...</p>
<h3>D. Subsection D</h3>
<p>Content appropriate_for_content_keyword45 Content <a href='/tag/appropriate_for_content_hashtag22' alt='appropriate_for_content_hashtag22'>#appropriate_for_content_hashtag22</a> Content...</p>
<h3>E. Subsection E</h3>
<p>Content Content Content Content appropriate_for_content_keyword83...</p>
<center><img src='url: "https://urlcom/photo-60"' alt='SYNONYMS_OF_THE_BLOG_TOPIC'/></center>
<center><p style='font-size: 0.75rem;'>Photo by <a href=\"https://url.com/@50?utm_medium=referral\">Name50</a> on <a href=\"https://url.com/?utm_medium=referral\">Url</a></p></center>

Example in Markdown:

## A. Major Section A
### 1. Subsection 1
Content appropriate_for_content_keyword50 Content...
### 2. Subsection 2
appropriate_for_content_keyword98 Content Content Content Content [#appropriate_for_content_hashtag43](/tag/appropriate_for_content_hashtag43)...
### 3. Subsection 3
Content [appropriate_for_content_internal_link](/internal-link57/) Content Content appropriate_for_content_keyword64...
### 4. Subsection 4
[appropriate_for_content_internal_link](/internal-link61/) appropriate_for_content_keyword08 Content...
![Photo by <a href=\"https://url.com/@09?utm_medium=referral\">Name09</a> on <a href=\"https://url.com/?utm_medium=referral\">Url</a>](url: "https://urlcom/photo-97")
## B. Major Section B
### 1. Subsection 1
Content Content Content Content Content appropriate_for_content_keyword13...
### 2. Subsection 2
Content appropriate_for_content_keyword22 Content Content...
### 3. Subsection 3
[appropriate_for_content_internal_link](/internal-link88/) appropriate_for_content_keyword51 [#appropriate_for_content_hashtag39](/tag/appropriate_for_content_hashtag39)...
### 4. Subsection 4
Content [appropriate_for_content_internal_link](/internal-link24/) Content appropriate_for_content_keyword65...
![Photo by <a href=\"https://url.com/@14?utm_medium=referral\">Name14</a> on <a href=\"https://url.com/?utm_medium=referral\">Url</a>](url: "https://urlcom/photo-55")
## C. Major Section C
### 1. Subsection 1
Content Content appropriate_for_content_keyword32...
### 2. Subsection 2
Content Content Content Content appropriate_for_content_keyword68 Content...
### 3. Subsection 3
Content Content Content Content appropriate_for_content_keyword91...
![Photo by <a href=\"https://url.com/@40?utm_medium=referral\">Name40</a> on <a href=\"https://url.com/?utm_medium=referral\">Url</a>](url: "https://urlcom/photo-97")
## D. Major Section D
### 1. Subsection 1
appropriate_for_content_keyword67 [#appropriate_for_content_hashtag20](/tag/appropriate_for_content_hashtag20) [appropriate_for_content_internal_link](/internal-link55/)...
### 2. Subsection 2
appropriate_for_content_keyword76 [appropriate_for_content_internal_link](/internal-link81/) Content Content...
### 3. Subsection 3
Content Content appropriate_for_content_keyword19 Content Content Content...
![Photo by <a href=\"https://url.com/@85?utm_medium=referral\">Name85</a> on <a href=\"https://url.com/?utm_medium=referral\">Url</a>](url: "https://urlcom/photo-38")
## E. Major Section E
### 1. Subsection 1
Content [appropriate_for_content_internal_link](/internal-link32/) appropriate_for_content_keyword35 Content [#appropriate_for_content_hashtag18](/tag/appropriate_for_content_hashtag18)...
### 2. Subsection 2
appropriate_for_content_keyword52 [#appropriate_for_content_hashtag45](/tag/appropriate_for_content_hashtag45) Content...
### 3. Subsection 3
[#appropriate_for_content_hashtag66](/tag/appropriate_for_content_hashtag66) Content appropriate_for_content_keyword16 Content Content...
![Photo by <a href=\"https://url.com/@71?utm_medium=referral\">Name71</a> on <a href=\"https://url.com/?utm_medium=referral\">Url</a>](url: "https://urlcom/photo-92")
## F. Major Section F
### 1. Subsection 1
appropriate_for_content_keyword62 Content Content...
### 2. Subsection 2
[appropriate_for_content_internal_link](/internal-link42/) appropriate_for_content_keyword28 Content Content Content...
![Photo by <a href=\"https://url.com/@32?utm_medium=referral\">Name32</a> on <a href=\"https://url.com/?utm_medium=referral\">Url</a>](url: "https://urlcom/photo-53")

Be interested in your findings, please keep us posted.

Ugh, I wanted to get rid of all those single lines for you and just put it all into a single triple quote f-string.

That is now beyond gpt-3.5-turbo’s capability to understand or follow.


Done and formatted by davinci-003 with last year’s superior operational brains and no backtalk:

system_prompt = f"""
INSTRUCTION FOR OPENAI LANGUAGE MODEL API:
1. **Blog Topic**: {topic}.
2. **Blog Format**:
   - Generate content in HTML format.
   - Do NOT use <h1> for the blog topic as it is already added in the publish_to_wordpress function.
   - Use <h2> for subheadings, and <h3> to <h6> for smaller subsections and additional information optimized for SEO.
   - Incorporate keywords from {keywords_list} based on their numerical value for better SEO optimization. The higher the number, the more frequently the keyword should be used. Ensure that keywords are subtly and coherently integrated into the text. They should not appear in consecutive sentences and must fit seamlessly into grammatically correct and meaningful content.
   - You may use stylistic elements like bold, underline, and italics for SEO purposes.
   - Craft a complete blog post following the {outline}.
   - Each heading in the post must consist of three paragraphs.
   - Under every heading, incorporate at least one list or table. For tables, use HTML table tags with thin black borders and lines separating rows and columns.
3. **Internal Links**:
   - Weave in a maximum of 5-8 internal links, and ONLY(!) from the list(>):{relative_links}(<) contextually into the content using the '<a href>' HTML tag with alt=\"\" attribute containing the link name for SEO optimization.
   - Never use links not present in the relative links list.
   - If a link can't be contextually integrated into the content, better to omit it.
   - Links should be incorporated within the content, not just at the end.
4. **Hashtags**:
   - Place 5 hashtags from the {hashtags_gpt} list in appropriate places within the text. Create an internal link in the format /tag/hashtag (without #) and add alt=\"\" as '#Hashtag OR_THE_BLOG_TOPIC or SYNONYMS_OF_THE....
(etc)
""".strip()

After killing 500 tokens 10x the cost to make a version that can be read, we can see you need to focus this more on just what gets the job done…

and if I couldn’t get a single writing task done with gpt-3.5-turbo, your prospects are not so bright either.

Try which GPT-4, by API, works for me :slight_smile:

    # Create article from the topic ###########################
    telegram(f"Generating article for {topic}", chat_id)

    system_prompt = (
        f"INSTRUCTION FOR OPENAI LANGUAGE MODEL API:\n"
        f"1. **Blog Topic**: {topic}.\n"
        f"2. **Blog Format**:\n"
        f"   - Generate content in HTML format.\n"
        f"   - Do NOT use <h1> for the blog topic as it is already added in the publish_to_wordpress function.\n"
        f"   - Use <h2> for subheadings, and <h3> to <h6> for smaller subsections and additional information"
        f" optimized for SEO.\n"
        f"   - Incorporate keywords from {keywords_list} based on their numerical value for better SEO optimization. "
        f"The higher the number, the more frequently the keyword should be used. Ensure that keywords are subtly and "
        f"coherently integrated into the text. They should not appear in consecutive sentences and must fit seamlessly "
        f"into grammatically correct and meaningful content.\n"
        f"   - You may use stylistic elements like bold, underline, and italics for SEO purposes.\n"
        f"   - Craft a complete blog post following the {outline}.\n"
        f"   - Each heading in the post must consist of three paragraphs.\n"
        f"   - Under every heading, incorporate at least one list or table. For tables, use HTML table"
        f" tags with thin black "
        f"borders and lines separating rows and columns.\n"
        f"3. **Internal Links**:\n"
        f"   - Weave in a maximum of 5-8 internal links, and ONLY(!) from the relative links list:{relative_links}"
        f" contextually"
        f" into the content using "
        f"the '<a href>' HTML tag with alt=\"\" attribute containing the link name for SEO optimization.\n"
        f"   - Only use links present in the relative links list.\n"
        f"   - If a link can't be contextually integrated into the content, better to omit it.\n"
        f"   - Links should be incorporated within the content, not just at the end.\n"
        f"4. **Hashtags**:\n"
        f"   - Place 5 hashtags from the {hashtags_gpt} list in appropriate places within the text."
        f" Create an internal link "
        f"in the format /tag/hashtag (without #) and add alt=\"\" as '#Hashtag OR THE_BLOG_TOPIC or"
        f" SYNONYMS_OF_THE_BLOG_TOPIC'.\n"
        f"5. **Images**:\n"
        f"   - Insert images from the {pictures} list in the content using the '<img src>' HTML tag"
        f" with alt=\"\" attribute "
        f"containing the blog topic or synonyms for SEO optimization.\n"
        f"   - Directly below every image, using a font size of 0.75rem, position the associated"
        f" referral link, centered beneath "
        f"the image using appropriate HTML styling.\n"
        f"6. **Post Structure**:\n"
        f"   - Contemplate the post's structure and image placement before crafting.\n"
        f"   - The structure should follow: first blog content, first image, second blog content,"
        f" second image, third blog content, "
        f"third image, fourth blog content.\n"
        f"7. **General Guidelines**:\n"
        f"   - Never use placeholders. Always write the entire blog post in one go.\n"
        f"   - The post is intended for WordPress, so opening HTML tags like '<html>', '<head>',"
        f" and '<body>' aren't required.\n"
        f"   - Adhere to all the aforementioned points during post creation.\n"
    )

    user_prompt = (f"Write a complete blog post on the topic: {topic}.")

    response = create_chat_completion("gpt-4", user_prompt, system_prompt, 5, 600)

    # If the response is empty, exit the script
    if not response:
        telegram("The response is empty.", chat_id)
        exit()

    pauza(5)
1 Like

To fine tune on a particular format, about the only thing I’d be concerned about is how much needs to be in the system prompt still.

Why “still”? Because there’s likely some logic that even 1000 examples can’t exactly explain.

This can be the case for classifiers, where from “banana” → “triple-ply” and “orange” → “corrugated”, the AI can’t infer where “kiwi” fits into the mix.

This output format and some of the rules seem pretty straightforward. The fully-developed output and its corresponding input can do a good job.

The data specs for generating individual posts I would specify in the user role.

then you have your system, user, and assistant defined.

Then there is just the labor or cost of using AI to write examples with this extensive prompt, and the churning through human inputs to make examples. Then proofread and edit to get to where it needs to be.