BUG: Incorrect programming language for code blocks when using Markdown

odebroqueville · April 4, 2023, 1:41pm

I asked ChatGPT if it supported Markdown and got this response:

Only one of the Markdown examples correctly identified markdown as the programming language. Sometimes, it is even identified as lua!

I’m working on a web extension that will render markdown code blocks, but it will not be feasible if the programming language isn’t correctly set to markdown.

odebroqueville · April 4, 2023, 1:42pm

Here is the example where markdown is identified as lua:

AN020C · April 4, 2023, 2:02pm

Although I can’t provide insight on the why, I’ve successfully instructed the API to return all results formatted in HTML. With that being said, if you show the API how you want to receive each response, that may do the trick. In my case, I provided numerous examples where each example response I have provided from “assistant” is formatted with headings, paragraphs and lists tags.

I’m not using GPT-4 yet, but while playing around with ChatGPT-4, I found that by asking it for content, formatted in markdown, did provide consistent results.

By asking the API if it knows how to respond as desired may simply return an undesirable response in contrast of instructing the api how to return a response. That is, unless I have misread the issues you’re running into

odebroqueville · April 4, 2023, 2:39pm

Are you saying that ChaGPT corrects itself based on our feedback? What prompt would you use to get the desired response in the example below:

anon10827405 · April 4, 2023, 2:42pm

Oh my god the suggested programming language in the corner is hilarious. I don’t even understand what they’re doing to guess - but it’s not the best At least it’s trying though.

odebroqueville · April 4, 2023, 3:05pm

It would appear that the problem arises because markdown isn’t being considered as a programming language. That may be accurate but it doesn’t help in correctly identifying the type of content in a code block.

I asked the following question where I explicitly asked ChatGPT to set the programming language to markdown:

give 3 examples of markdown and always set the programming language in the code blocks to markdown

And this was the answer:

anon10827405 · April 4, 2023, 3:08pm

It’d be interesting to look into. I was receiving some very orthodox typescript and it was labelling it as scss was enjoyable. I wonder if they’re using a separate AI for classification, or simply just asking GPT to guess?

It’s strange - I’m certain that if I were to copy and paste the same code that it has declared as scss, it would say “No, this is not scss”

odebroqueville · April 4, 2023, 3:11pm

odebroqueville · April 4, 2023, 3:17pm

ChatGPT correctly identifies that the code is markdown, but because it’s not a programming language but rather a markup language, it prevents itself from setting it as the “programming language” for a code block and the result ends up being more or less random!

anon10827405 · April 4, 2023, 3:21pm

Nice testing.

I’m hoping to get some wildly inaccurate identifications again, so far it’s been accurate.

odebroqueville · April 4, 2023, 3:28pm

I had no issues with HTML or YAML! Maybe it’s just a Markdown thing!

anon10827405 · April 4, 2023, 3:29pm

Ahhhh I wish I understood a bit more.

I wonder, when it does write the guessed language at the top, it writes it before it has actually produced (or shows) any tokens related to the code.

So, is it guessing based on its answer, or based on the question?
Or… dun dun dun… both

anon10827405 · April 4, 2023, 3:35pm

Here’s a fun test:

Give it a code block in a language (for example Python)
Ask it to write the code in another programming language (don’t name any)

It will “guess” the code based on the prompt in this case it will say that it’s Python regardless of the language that it actually types out.

Apparently some of my typescript related questions are more suitable for scss…yikes…

Still would need more testing to know if it’s actually true.

Actually, it will update itself live.
I gave it JS, told it to re-write it into another language, it chose Python.

For the first 2-3 lines it showed “javascript”, and then it immediately switched to “Python”. So it must be initially guessing, and then evaluating the produced code every line. Very neat. Still doesn’t explain why I once got a decent (5-8 lines) of typescript believed as SCSS

AN020C · April 4, 2023, 4:14pm

I was suggesting that if your goal is to receive responses in markdown, showing the API what you want by providing detailed examples (and experimenting with temperature, etc…) has proven successful in my experience. This, of course, is based on my experience using the GPT-3.5-turbo API and not the ChatGPT UI.

N2U · April 4, 2023, 6:12pm

This is not GPT’s doing, I’ve run into this on multiple other sites, the “bug” in this instance is actually just part of how the code block feature in the extended markdown syntax is implemented. When you start a multiline code block with 3 backticks you can specify the language like this:

```java

The markdown interpreter (not GPT) will try to guess the programming language if a language key isn’t specified, it’s not very good at this and will often fail, we can see that if we prompt chatGPT:

please phrase 10 code blocks without a language key, each containing 1 line of code in a different language. Above every code block there should be a headline in bold stating the language within the subsequent code block.

The answer will look something like this:

As we can see chatGPT is very aware of the language within the code block, it’s just very cautious about generating code blocks with language keys.

This can be fixed by telling the model what language key to use, we can even tell GPT to use the wrong key:

We’re testing the markdown interpreter, please provide a python function phrased in a js code block.

And we get the response:

I would love to show some responses from the API as well, but I’m one of the many unfortunate people currently experiencing access issues, so if someone else could chime in here that would be amazing

anon10827405 · April 4, 2023, 7:04pm

Great information. Thanks for the clarification.

Although, it does have a language guessed before a single character is typed which from my (very limited) testing seems to come from the prompt.

I imagine something does initially guess it based on the prompt, and then the interpreter updates the state once some content is fed. Have you tried your bottom example with a longer script?

I tried the same without mentioning any languages by name and watched it start with the prompted language, and then become overwritten half-way through into the correct language that it was typing

N2U · April 4, 2023, 7:42pm

Thank you, always happy to help!

I suspect you’re correct although I haven’t done any further testing. Markdown is a typesetting language, like TeX but simpler, it needs an interpreter to output to a specific format like a website or a pdf. Markdown doesn’t have color support, that is all handled by the interpreter, we can’t really know how the guessing is performed without knowing what interpreter chatGPT site is running.

Topic		Replies	Views
Does ChatGPT have a built-in Markdown parser API	7	19423	May 16, 2024
Suggestions for perfect escaping of code with GPT? Plugins / Actions builders chatgpt , plugin-development , api	11	2011	August 7, 2023
Extract code from Text gpt response API	16	18226	February 5, 2024
How to prevent GPT from outputting responses in Markdown format? Prompting chatgpt , lost-user , output-markdown	15	9909	May 16, 2025
Chat Stream alternating code /code fields from 3 backticks (HTML Javascript css) API api	4	2136	November 20, 2023

BUG: Incorrect programming language for code blocks when using Markdown

Related topics