Cannot get gpt-4o-mini to follow instructions

I have a text consisting of Biblical scriptures: https://s3.us-west-2.amazonaws.com/docs.scbbs.com/docs/test/text_with_line_nums.txt

I have prepended line numbers to make it easy for the model to identify individual lines.

I want the model to create an outline of the verses in the
text. Only verse numbers which begin a line can
be used as title segments.

However, the models, both gpt-4o and gpt-4o-mini refuse to follow my
instructions.

Lines that begin like this are accepted as title segments:

line0001: 1 13 …
line0015: 4 8 …

But lines which do NOT begin with a verse number are not accepted.
This verse number cannot be used to begin a title segment:

line0005: …in the verse will be fulfilled in him ( Talmidei Rabbeinu Yona ). 2 9 …

Yet, the model always lists 2 9 in the returned outline.

Here is example from the playground. Note that when I call the model
out on this, it will correct one, but continue to make the same
mistake with other lines:

https://platform.openai.com/playground/chat?models=gpt-4o-mini&preset=preset-FBRqhY1DaCYBgTU7eyrUvcE7

Here are the prompt instructions:

Task: Analyze the following line-numbered Talmud scripture and generate a JSON representation of its semantic structure, organizing content by ideas rather than strict verse divisions.

Output Format:

The JSON output should be an array of objects, where each object represents a segment of the document’s semantic structure and has the following properties:

  • title: The verse number followed by a concise summary of the main idea or topic of the segment (maximum 10 words total, including the verse number). If the summary exceeds the allowed length, include only the first words that fit within the limit followed by “…”. A title segment must begin a line. The verse number must be the first characters on the line.
  • level: Always set to 1, as there will only be one level in the hierarchy.
  • start_line: The value of the “linenoxxxx:” marker for the line where the segment begins. This must be the line with the verse number that begins the line.
  • has_children: Always set to “N”, as there will be no child segments.
  • children: An empty array, as there will be no child segments.

Segmentation Guidelines:

  1. A new segment can ONLY begin at the start of a new line where a verse number (e.g., “1 13” or “16 16”) appears as the very first content on that line. This is critical and must be strictly followed.

  2. If a verse number appears in the middle or at the end of a line, it cannot be used as the start of a new segment. The next valid segment will begin with the next verse number that appears at the start of a line. If a verse number starts anywhere except the beginning of the line, then the semantic idea of that verse must be included with the previous verse.

    For example, these verse numbers can begin title segments:

    line0001: 1 13 ...
    line0015: 4 8 ...
    

    But this verse number cannot be used to begin a title segment:

    line0011: and continue to live unaffected by them. 3 13 ...
    

    By “start of line,” we mean that the verse number MUST be the first characters after the line0000: marker.

  3. Group verses together based on shared ideas or topics. A segment may include one or more verses if they discuss the same concept.

  4. Create new segments when there’s a significant shift in topic, argument, or speaker, but only if condition #1 is met.

  5. Ensure that each segment represents a coherent idea or discussion point.

Additional Instructions:

  • Pay close attention to where verse numbers appear. Only those at the very beginning of a line can start a new segment.
  • If a verse number appears mid-line or at the end of a line, it should be considered part of the current segment, not the start of a new one.
  • Each proposed segment must be verified to ensure it starts with a verse number at the beginning of the line. If it does not, exclude it from the output and report the discrepancy.
  • Ensure that the entire input text is accounted for in the output JSON, with no content omitted.

Example Output Structure:

[
  {
    "title": "1 13 Rabbi Yosei on synagogue blessings",
    "level": 1,
    "start_line": "line0001",
    "has_children": "N",
    "children": []
  },
  {
    "title": "3 13 Discussion on Divine Presence",
    "level": 1,
    "start_line": "line0068",
    "has_children": "N",
    "children": []
  },
  ...
]

Only return the JSON object. Do not include any explanatory text or additional formatting.

Important Note: The verse number must the first characters in the linexxxx: in order to be used as a title segment.

End System Message

This is what is consistently returned:

[
{
“title”: “1 13 Rabbi Yosei on synagogue blessings”,
“level”: 1,
“start_line”: “line0001”,
“has_children”: “N”,
“children”:
},
{
“title”: “2 9 Discussion on the power of sight”,
“level”: 1,
“start_line”: “line0005”,
“has_children”: “N”,
“children”:
},
{
“title”: “3 13 Abaye on demon abundance”,
“level”: 1,
“start_line”: “line0011”,
“has_children”: “N”,
“children”:
},
{
“title”: “4 8 Rav Huna on the protection from demons”,
“level”: 1,
“start_line”: “line0015”,
“has_children”: “N”,
“children”:
},
{
“title”: “5 30 Rava summarizing effects of demons”,
“level”: 1,
“start_line”: “line0020”,
“has_children”: “N”,
“children”:
},

etc…

Any suggestions?

3 Likes

First thought would be to parse them into verses first then just have one on each line? That way the AI doesn’t have to “guesstimate”… Your system instructions are a bit long and detailed which is likely while it’s stumbling. I’d work on feeding it cleaner data with pre-processing if possible…

2 Likes

Good to “see” you here again @SomebodySysop!

I would re-organize the segmentation guidelines a bit as well as be more consistent in wording. Specifically, I would consistently refer to the line0000:marker instead of start of line and introduce this definition right at the beginning of segmentation your guidelines (as opposed to the end of point 2).

Also, if you refine the segmentation guidelines, then you can likely get rid of part of the additional instructions and just focus on what is not already covered.

3 Likes

Giving up the struggle of trying to get the model to “read between the lines”, so to speak.

The potential problem is that, by starting a segment with a verse that starts in the middle of a line, the end line of the preceding segment would be the cut short (because in my line numbering methodology, I always end a segment one line before the next segment).

Rather than convince the model to not do this (nearly impossible), the easier solution is to incorporate the legacy “overlap” strategy to the process. Since I don’t know when a verse will start in the middle of a line, I simply add the first line of the following segment as the last line of the current segment, or “overlap” the following segment by one line.

So, yes, that line would be duplicated as the end line of one segment and the start line of the next.

The data file seems to have several corrupted lines. My first thought is that this unsanitized data may be distracting and hard to process. Cleaning would be ideal. First batching it through a rewriting AI or filtering code that can identify a dominance of non-English in a line could help.

Secondly, consider gpt-4o and -mini to be a small-context-observation model, performing like its attention mechanism is some kind of RAG retrieval. It does not observe all-at-once well. It is much better at processing THROUGH documentation than trying to assemble quality analysis from the bulk as output without steps of forethought. Try GPT-4-0613 and you will likely see performance and attention you expect.

Finally, I gave the prompt re-tuning by o1 as your dumb-model prompt buddy. See if this doesn’t implement the same logic, described significantly clearer. If it is wrong, that means that OpenAI’s flagship model still couldn’t understand what you want after a few rounds of my own guidance.



Task: Analyze the provided line-numbered Biblical text and generate a JSON array representing its semantic structure, organized by ideas.

Output Format: Each element in the JSON array must be an object with the following properties:

  • title: The verse number followed by a concise summary of the segment’s main idea (maximum of 10 words total, including the verse number). If the summary exceeds 10 words, truncate and add “…”.
  • level: Always set to "1".
  • start_line: The line identifier (e.g., "line0001") where the segment begins. This line must start with the verse number immediately after the "lineXXXX:" marker.
  • has_children: Always set to "N".
  • children: An empty array [].

Segmentation Rules:

  1. Segment Start Condition:

    • A new segment only begins at a line where a verse number (e.g., "1 13" or "16 16") is the very first content immediately after the "lineXXXX:" marker.
    • There must be no characters, spaces, or symbols before the verse number on that line.
  2. Mid-line or End-line Verse Numbers:

    • If a verse number appears anywhere else in a line (mid-line or end-line), it is considered part of the current segment and cannot start a new segment.
    • Do not create a new segment for such verse numbers.
  3. Grouping Content:

    • Include all consecutive lines in a segment until the next valid segment start is encountered.
    • Lines that do not start with a verse number are included in the current segment.
  4. Complete Coverage:

    • Ensure all lines from the input text are included in the segments without omission.
    • Do not skip any content between segments.

Examples:

  • Valid Segment Start:

    line0001: 1 13 Rabbi Yosei on synagogue blessings...
    
  • Invalid Segment Start (do not start a new segment here):

    line0005: ...in the verse will be fulfilled in him (Talmidei Rabbeinu Yona). 2 9 ...
    

Important Notes:

  • Strict Compliance:

    • Apply the segmentation rules strictly and consistently throughout the entire text.
    • Do not include verse numbers that do not meet the Segment Start Condition as segment titles.
  • Verification:

    • Before starting a new segment, verify that the line begins exactly with a verse number immediately after the line identifier.
  • No Additional Formatting:

    • Provide only the JSON array as the output.
    • Do not include explanations, headers, or any additional text.

Example Output:

[
  {
    "title": "1 13 Rabbi Yosei on synagogue blessings",
    "level": "1",
    "start_line": "line0001",
    "has_children": "N",
    "children": []
  },
  {
    "title": "4 8 Discussion on Divine Presence",
    "level": "1",
    "start_line": "line0015",
    "has_children": "N",
    "children": []
  }
]

(quote this post to get the markdown version of the prompt)


The prompting should be a user role task. System should just instill confidence in the specialization the AI will perform.

Structured response format can produce the output without deviation from a schema. The entirety of desired JSON objects could be in a named array, so that multiple can be output.

(Just when you thought prompt engineering was behind us, we get models that can’t follow instructions again.)

2 Likes

Thank you very much for the effort! Unfortunately, it still does the same with gpt-4o-mini and even gemini-1.5-flash (creating title segments for mid-line verses). As I wanted to use this in production for semantic chunking, need to be able to use the cheapest models for that.

That corrupted text is untranslated Hebrew characters. The text that I’ve been testing as well as the text that is extracted and embedded has the correct Hebrew characters. I don’t know why that text looks like that when uploaded to Amazon S3.

It is a bit stunning how the models refuse to follow the instructions, even when I point out the error in the Playground.

Hopefully in a few months the prices will come down on 01 which does do a much better job. In the meantime, I guess I’ll have to settle for my band-aid “overlap” approach.

Again, many thanks for chipping in!

Have you considered a fine-tuned model using the new distillation capability?

I’m actually in the process myself for creating a fine-tuned gpt-4o for select parts of the semantic chunking process for a specific project of mine using o1 outputs.

I’m not sure if a fine-tuned gpt-4o-mini will be capable enough but it might be worth a try?

1 Like

So far, no. But I’ve been storing my completions offline for nearly two years now. I haven’t tried to fine-tune with this data because it is still mostly beta test data as my applications are still not yet released.

My plan from the very beginning was to amass several months of live completion history from real users, rank those, and use the best ones to fine-tune a model that would be used for future completions.

In looking at the current OpenAI distillation capability, those completions are only stored for 30 days.

And, since you can’t update or “re-train” trained models, I figured it was best to wait until I had actual real world completions to work with.

So, while my semantic chunking methodology was perfect for the legal agreements, religious scripture is a whole different animal. However, once I figure out how to get a good outline, it’s going to work just as well here.

But, to be honest, while the atomic idea works in general, I don’t think it’s the best solution for these types of religious texts that get broken down by verse (that’s what all the models seem to be trained to do). My thinking, just based upon experience, is that standard chunking of multiple verses may actually be the better embedding strategy.

But, who knows for sure?

1 Like