I am using chatgpt4 and it’s returning good responses except sometimes there is text that it wants to indicate should be in bold but does so with three *s instead of html tags.
It also does the same for titles with three hashes ### title
Is there anyway to get it to either not include these or use the relevant html tags?
The AI has been trained (and overtrained) on only producing the markdown format that is used by the ChatGPT renderer.
If you are in ChatGPT, then it is working as OpenAI desires, showing you a display format.
If you are on API, and want the AI to write HTML for your web page as its native response format, this is now incredibly difficult to overcome (but older models that developers still maintain access to like gpt-4-0314 can do with ease by simple instruction).
You’ll need to specifically instruct that the desired output is HTML that you’ll be pasting elsewhere, and the AI’s task is to help you write that HTML. It still will probably be enclosed in a markdown code block for “code” formatting in the ChatGPT renderer.
In use of the API, you can just give up and use a markdown-to-html library just as you would if displaying the output to a user.
If you are programming the API by a system message, or Assistants by instructions, the most straightforward approach is to simply tell the AI in a separate “response format” section of text you write that there is no markdown renderer available, output would break the appearance, and that markdown is prohibited, including the characters not to produce.
You can additionally penalize the generation of tokens such as asterisks and hash marks with the logit_bias parameter.
In ChatGPT, you have custom instruction, that are placed similarly - just not as believed. You can tell the AI similarly in a “how to act” box that these are verboten.
response = client.chat.completions.create(
model="gpt-4-turbo-preview",
messages=[
{
"role": "user",
"content": f"write a few parapgraphs that total approximately 200 words on how to: {heading} for a blog titled: {blog_post_title}"
},
],
temperature=1,
max_tokens=1000,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)
Sorry,. I’m completely new to openai and don’t have a clue where to start and what everything means yet. I’m still learning do i add in a role of system and then in the content tell it not to do things like this?
{"role": "system", "content": "Do not user markdown formatting, * or #"},
{"role": "user", "content": "my prompt"},
You can simply include in the instruction that the output must be me in html format starting with div
But the problem that using html format will be more expensive as you will consume more token.
When you get your response you can sanitize it with replace:
const chatGPTResponse = data.choices[0].message.content.trim();
// bold markdown to B tag
var cleanResponse = chatGPTResponse.replace(/\*\*(.*?)\*\*/g, "<b>$1</b>");
// italic markdown to I tag
cleanResponse = cleanResponse.replace(/\*(.*?)\*/g, "<i>$1</i>");
You can … damage your math and other AI output with replace.
Math that looks like 3<strong>3</strong>5 = 45 doesn’t make much sense.
Nor does 335 = 45, were the AI to write 3*3*5.
Even ** in this sentence, where whitespace** is next to a word* and *enclosing the word or sequence is not affected by this forum’s proper parsing of markdown.
* (word can even be this line starting with * that would normally be a bullet point, altered by a regex because there are two of them here)
** (whitespace - characters such as space, tab, or linefeed)
Use markdown → HTML parse libraries specific to the purpose, properly identifying valid commonmark containers.
Hi Jay, thank you for your objection that is correct for the general principle… in my answer i sticked to the examples given in the question, and wanted to illustrate the concept of string sanitization… obviously to adapt the concept to all cases one must study the various posisble cases and modify the rules or apply new (subsequent) ones…
You can … damage your math and other AI output with replace.
Math that looks like 335 = 45 doesn’t make much sense.
Nor does 335 = 45, were the AI to write 335.
Even ** in this sentence, where whitespace** is next to a word* and *enclosing the word or sequence is not affected by this forum’s proper parsing of markdown.
(word can even be this line starting with * that would normally be a bullet point, altered by a regex because there are two of them here)
** (whitespace - characters such as space, tab, or linefeed)
Use markdown → HTML parse libraries specific to the purpose, properly identifying valid commonmark containers.