Extract code from Text gpt response

matan · December 27, 2022, 2:09pm

Hey
When receiving a response back from Text GPT Api is there a way to know which parts are code and which parts are text?

In Chat-GPT when it writes back it will highlight code blocks, so I assume there is a way to do so.

Do you have any idea how to do so? Thank in advance!

trail001 · December 27, 2022, 3:27pm

in your prompt tell it to wrap any code examples in <code></code> blocks and it will do so:)

matan · December 27, 2022, 3:30pm

That’s a good idea, I tried it previously but sometimes it does not close the ending block, it seems to not be due to running of Tokens.

But I will try again, thanks for the reply!

trail001 · December 27, 2022, 3:35pm

i have been experimenting with two bots Marv and Fleur, both different characters. Fleur loves to beautify her responses for clarity with html and she prefers to use bulletpoints in her explanations and puts them on new lines for extra clarity and loves bold headings… while Marv doesn’t beautify his normal responses but does always wrap his script examples within the code blocks… i put the bot in telegram groups… marv was more popular with the nerds in a linux group… fleur was more populair in a more spiritual group… it was fun to experiment for 3 days with having a cranky sarcastic bot and a spiritual happy bot next to each other that both have different ways to present the requested information… it worked… too bad i ran out of credits so fast while people where asking questions and testing their capabilities… many people where amazed by the power of AI… heck testing was a lot of fun while it lasted… be careful… there’s no warning or indication how many tokens you are spending while testing…

matan · January 2, 2023, 1:46pm

Does anyone have any idea how to solve this?

erowell83 · March 25, 2023, 3:49am

I just tell the assistant exactly what I want by appending more instructions:

{ role: “user”, content: ${instruction}. Only respond with code as plain text without code block syntax around it. },

jacknoordhuizen · March 30, 2023, 4:09pm

Here is someone who actually does this Chat App Line 88-97

EricGT · March 30, 2023, 4:19pm

When Chatml Markdown becomes publicly avaiable and if/for your mode it might be a solution to your problem.

EricGT · March 30, 2023, 4:22pm

GitHub has a variation of the URL that can select specific lines,
e.g. (#L<start>-L<end>),
i.e.

github.com

mnick-yt/Chat-Bot-using-gpt-3.5-turbo/blob/main/ChatGPT API/templates/index.html#L88-L97


      
          // Check if the content has code block

          const hasCodeBlock = content.includes("```");

          if (hasCodeBlock) {

            // If the content has code block, wrap it in a <pre><code> element

            const codeContent = content.replace(/```([\s\S]+?)```/g, '</p><pre><code>$1</code></pre><p>');

          

           

            messageDiv.innerHTML = `<img src="{{ url_for('static', filename='images/gpt.jpg') }}" class="bot-icon"><p>${codeContent}</p>`

           

          }

Mohamed.MOUROUH · July 14, 2023, 8:40pm

I created a chatGPT clone and I used this to extract the code from text recieve from openAI API :

type Code = {
  key: string;
  code: string;
}
  let codes: Code[] = []; ;
        if (answer) {
          const textCode = answer?.match(/```([\s\S]+?)```/g);
          if(textCode && textCode?.length > 0){
            codes = textCode
              ?.join(" ")
              .split("```")
              .map((code) => code.trim())
              .filter((code) => code != "").map(c => ({"key": c.slice(0,c.indexOf('\n')), "code": c.slice(c.indexOf('\n'))}));
          }
          console.log(codes)
        }

the answer is the text you got from API , the nI store code in an array of Code[] where key is the code language and code is the text code itself.
Happy hacking

barryscary · September 6, 2023, 1:26am

This was good, thanks. I modified it slightly to return the original response if it didn’t match, and I assumed the same language being returned for all code blocks and combined them into a single string with some formatting. I minimally tested it with responses that didn’t have markdown to strip out, as well as with those that have markdown and extra chat text. In case anyone wants it.

function parseCode(response: string) {
  type Code = {
    key: string;
    code: string;
  };
  let codes: Code[] = [];
  const textCode = response.match(/```([\s\S]+?)```/g);
  if (!textCode)
    return response;
  
  if (textCode && textCode?.length > 0) {
    codes = textCode
      .join(" ")
      .split("```")
      .map((code) => code.trim())
      .filter((code) => code != "").map(c => ({"key": c.slice(0,c.indexOf('\n')), "code": c.slice(c.indexOf('\n'))}));
  }
  
  return codes.map(code => code.code).join('\n\n').trim();
}

Foxalabs · September 6, 2023, 7:43am

Couple of things to be careful of when using a simple regex for markdown removal:

code segments can also be “inline” denoted by a single `
``` could be part of the code block, i.e. the code uses ``` internally.

to handle those you could modify the code :

function parseCode(response: string) {
  type Code = {
    key: string;
    code: string;
  };

  const codes: Code[] = [];
  
  while (true) {
    const start = response.indexOf('```');
    const end = response.lastIndexOf('```');

    if (start === -1 || end === -1 || start >= end) {
      // No more blocks to find or incorrectly nested backticks
      break;
    }

    // Extract the block between the backticks
    const block = response.slice(start + 3, end).trim();
    const newlineIndex = block.indexOf('\n');
    
    if (newlineIndex !== -1) {
      codes.push({
        key: block.slice(0, newlineIndex),
        code: block.slice(newlineIndex + 1)
      });
    } else {
      codes.push({
        key: block,
        code: ''
      });
    }

    // Remove this block from the response for subsequent iterations
    response = response.slice(0, start) + response.slice(end + 3);
  }

  return codes.map(code => code.code).join('\n\n').trim();
}

barryscary · September 6, 2023, 7:48pm

Yeah good catch on the corner cases. Forgetting inline for a second, both solutions actually have bugs. The regex version fails with extra ‘```’ in the code. The loop version fails if there are multiple code blocks. For example:

parseCode is the regex impl
parseCode2 is the loop impl

const testOneBlock = "Sure! I'm a helpful chatbot.\n```typescript\nconsole.log('hello world');\n```";
const testOneBlockWithExtraTicks = "Sure! I'm a helpful chatbot.\n```typescript\nconsole.log('hello ```world');\n```";
const testTwoBlocks = "Sure! I'm a helpful chatbot.\n```typescript\nconsole.log('hello world');\n```\n\nI'm still really helpful.\n```typescript\nconst x = 'yo';\nconsole.log(x);\n```\n\nMore unhelpful chat";

(async function() {
  console.log(`parseCode(testOneBlock): \n${parseCode(testOneBlock)}`);
  console.log(`parseCode(testOneBlockWithExtraTicks): \n${parseCode(testOneBlockWithExtraTicks)}`);
  console.log(`parseCode(testTwoBlocks): \n${parseCode(testTwoBlocks)}`);
  console.log(`parseCode2(testOneBlock): \n${parseCode2(testOneBlock)}`);
  console.log(`parseCode2(testOneBlockWithExtraTicks): \n${parseCode2(testOneBlockWithExtraTicks)}`);
  console.log(`parseCode2(testTwoBlocks): \n${parseCode2(testTwoBlocks)}`);
})()

Output:

parseCode(testOneBlock): 
console.log('hello world');
parseCode(testOneBlockWithExtraTicks): 
console.log('hello
parseCode(testTwoBlocks): 
console.log('hello world');


const x = 'yo';
console.log(x);
parseCode2(testOneBlock): 
console.log('hello world');
parseCode2(testOneBlockWithExtraTicks): 
console.log('hello ```world');
parseCode2(testTwoBlocks): 
console.log('hello world');
\`\`\`

I'm still really helpful.
\`\`\`typescript
const x = 'yo';
console.log(x);

This seems to work for these test cases tho:

export function parseCode3(code: string) {
  if (!code.match(/```([\s\S]+?)```/g))
    return code;

  const filteredLines: string[] = [];
  let inCodeBlock = false;
  for (let line of code.split('\n')) {
    if (line.startsWith('```')) {
      inCodeBlock = !inCodeBlock;
      if (!inCodeBlock)
        filteredLines.push('\n');
        
      continue;
    }

    if (inCodeBlock)
      filteredLines.push(line);
  }

  return filteredLines.join('\n');
}

barryscary · September 6, 2023, 10:37pm

And just for amusement given why we’re all here. This is the best I was able to get gpt-3.5-turbo to do in solving this parsing problem. Didn’t really work, but had the right considerations.

export function parseMarkdownCodeBlocks(markdownText: string): string {
  const codeBlockRegex = /```(.*?)\s*([\s\S]+?)```/g;
  const codeBlocks: { language: string | null; code: string }[] = [];

  let match: RegExpExecArray | null;
  let lastIndex = 0;

  while ((match = codeBlockRegex.exec(markdownText)) !== null) {
    // Capture everything before the code block
    const beforeCodeBlock = markdownText.substring(lastIndex, match.index);

    // Capture the language name (if specified)
    const languageName = match[1].trim().toLowerCase(); // Convert to lowercase

    // Exclude code blocks with language names containing "example"
    if (!languageName.includes('example')) {
      // Capture the code block content
      const codeBlockContent = match[2];

      // Exclude triple backticks within a string inside the code
      const codeBlockWithTripleBackticksFixed = codeBlockContent.replace(/```/g, '``\`'); // Replace ``` with ```

      // Exclude the language name from the code block
      const languageExcludedCodeBlock = {
        language: null,
        code: `${languageName ? '```' + languageName + '\n' : ''}${codeBlockWithTripleBackticksFixed}`,
      };

      codeBlocks.push(languageExcludedCodeBlock);
    }

    // Update the lastIndex to the end of the match
    lastIndex = match.index + match[0].length;
  }

  // Join the code blocks with two newlines
  const codeBlockString = codeBlocks
    .map((block) => block.code)
    .join('\n\n');

  return codeBlockString;
}

l.dijkman · November 17, 2023, 6:00pm

Breaking my head over this one

I asked gpt

String with multiple tree backticks replace for alternating code /code

Problem my chat is stream is not always 3 backticks in a stream message

Ace editor with ai

ldijkman.github.io/Ace_Seventh_Heaven/The_AfterWorld.html

johnnyzheng98 · February 5, 2024, 3:51am

Here’s a simple one:

import re
code = re.split(‘```[a-zA-Z0-9- ]*’,gpt_response,maxsplit=10)[1]

Topic		Replies	Views
Chat Stream alternating code /code fields from 3 backticks (HTML Javascript css) API api	4	2189	November 20, 2023
Suggestions for perfect escaping of code with GPT? Plugins / Actions builders chatgpt , plugin-development , api	11	2035	August 7, 2023
GPT-4o producing bad linebreaks and invalid JSON API api , json , gpt-4o	11	2278	June 14, 2024
I want to get json format response which I can pass using gpt-4 model. Also I want to give my prompt to get json data Prompting gpt-4	14	21044	October 26, 2023
Plugin injection attack, pseudo code prompts, chain of thought, plugin orchestration, and more Plugins / Actions builders gpt-4 , plugin-development	26	6969	April 14, 2024

Extract code from Text gpt response

Related topics