BUG: Incorrect programming language for code blocks when using Markdown

I asked ChatGPT if it supported Markdown and got this response:

Only one of the Markdown examples correctly identified markdown as the programming language. Sometimes, it is even identified as lua!

I’m working on a web extension that will render markdown code blocks, but it will not be feasible if the programming language isn’t correctly set to markdown.

Here is the example where markdown is identified as lua:

Although I can’t provide insight on the why, I’ve successfully instructed the API to return all results formatted in HTML. With that being said, if you show the API how you want to receive each response, that may do the trick. In my case, I provided numerous examples where each example response I have provided from “assistant” is formatted with headings, paragraphs and lists tags.

I’m not using GPT-4 yet, but while playing around with ChatGPT-4, I found that by asking it for content, formatted in markdown, did provide consistent results.

By asking the API if it knows how to respond as desired may simply return an undesirable response in contrast of instructing the api how to return a response. That is, unless I have misread the issues you’re running into :stuck_out_tongue:

1 Like

Are you saying that ChaGPT corrects itself based on our feedback? What prompt would you use to get the desired response in the example below:

Oh my god the suggested programming language in the corner is hilarious. I don’t even understand what they’re doing to guess - but it’s not the best :rofl: At least it’s trying though.

1 Like

It would appear that the problem arises because markdown isn’t being considered as a programming language. That may be accurate but it doesn’t help in correctly identifying the type of content in a code block.

I asked the following question where I explicitly asked ChatGPT to set the programming language to markdown:

give 3 examples of markdown and always set the programming language in the code blocks to markdown

And this was the answer:

:sweat_smile:

1 Like

It’d be interesting to look into. I was receiving some very orthodox typescript and it was labelling it as scss :joy: was enjoyable. I wonder if they’re using a separate AI for classification, or simply just asking GPT to guess?

It’s strange - I’m certain that if I were to copy and paste the same code that it has declared as scss, it would say “No, this is not scss”

ChatGPT correctly identifies that the code is markdown, but because it’s not a programming language but rather a markup language, it prevents itself from setting it as the “programming language” for a code block and the result ends up being more or less random!

Nice testing.

I’m hoping to get some wildly inaccurate identifications again, so far it’s been accurate.

I had no issues with HTML or YAML! Maybe it’s just a Markdown thing!

Ahhhh I wish I understood a bit more.

I wonder, when it does write the guessed language at the top, it writes it before it has actually produced (or shows) any tokens related to the code.

So, is it guessing based on its answer, or based on the question? :thinking:
Or… dun dun dun… both

Here’s a fun test:

Give it a code block in a language (for example Python)
Ask it to write the code in another programming language (don’t name any)

It will “guess” the code based on the prompt in this case it will say that it’s Python regardless of the language that it actually types out.

Apparently some of my typescript related questions are more suitable for scss…yikes…

Still would need more testing to know if it’s actually true.

Actually, it will update itself live.
I gave it JS, told it to re-write it into another language, it chose Python.

For the first 2-3 lines it showed “javascript”, and then it immediately switched to “Python”. So it must be initially guessing, and then evaluating the produced code every line. Very neat. Still doesn’t explain why I once got a decent (5-8 lines) of typescript believed as SCSS

I was suggesting that if your goal is to receive responses in markdown, showing the API what you want by providing detailed examples (and experimenting with temperature, etc…) has proven successful in my experience. This, of course, is based on my experience using the GPT-3.5-turbo API and not the ChatGPT UI.

This is not GPT’s doing, I’ve run into this on multiple other sites, the “bug” in this instance is actually just part of how the code block feature in the extended markdown syntax is implemented. When you start a multiline code block with 3 backticks you can specify the language like this:

```java

The markdown interpreter (not GPT) will try to guess the programming language if a language key isn’t specified, it’s not very good at this and will often fail, we can see that if we prompt chatGPT:

please phrase 10 code blocks without a language key, each containing 1 line of code in a different language. Above every code block there should be a headline in bold stating the language within the subsequent code block.

The answer will look something like this:

As we can see chatGPT is very aware of the language within the code block, it’s just very cautious about generating code blocks with language keys.

This can be fixed by telling the model what language key to use, we can even tell GPT to use the wrong key:

We’re testing the markdown interpreter, please provide a python function phrased in a js code block.

And we get the response:

I would love to show some responses from the API as well, but I’m one of the many unfortunate people currently experiencing access issues, so if someone else could chime in here that would be amazing :heart:

2 Likes

Great information. Thanks for the clarification.

Although, it does have a language guessed before a single character is typed which from my (very limited) testing seems to come from the prompt.

I imagine something does initially guess it based on the prompt, and then the interpreter updates the state once some content is fed. Have you tried your bottom example with a longer script?

I tried the same without mentioning any languages by name and watched it start with the prompted language, and then become overwritten half-way through into the correct language that it was typing

Thank you, always happy to help!

I suspect you’re correct although I haven’t done any further testing. Markdown is a typesetting language, like TeX but simpler, it needs an interpreter to output to a specific format like a website or a pdf. Markdown doesn’t have color support, that is all handled by the interpreter, we can’t really know how the guessing is performed without knowing what interpreter chatGPT site is running.

1 Like