Extract code from Text gpt response

Hey :slight_smile:
When receiving a response back from Text GPT Api is there a way to know which parts are code and which parts are text?

In Chat-GPT when it writes back it will highlight code blocks, so I assume there is a way to do so.

Do you have any idea how to do so? Thank in advance!

in your prompt tell it to wrap any code examples in <code></code> blocks and it will do so:)

Thatā€™s a good idea, I tried it previously but sometimes it does not close the ending block, it seems to not be due to running of Tokens.

But I will try again, thanks for the reply!

i have been experimenting with two bots Marv and Fleur, both different characters. Fleur loves to beautify her responses for clarity with html and she prefers to use bulletpoints in her explanations and puts them on new lines for extra clarity and loves bold headingsā€¦ while Marv doesnā€™t beautify his normal responses but does always wrap his script examples within the code blocksā€¦ i put the bot in telegram groupsā€¦ marv was more popular with the nerds in a linux groupā€¦ fleur was more populair in a more spiritual groupā€¦ it was fun to experiment for 3 days with having a cranky sarcastic bot and a spiritual happy bot next to each other that both have different ways to present the requested informationā€¦ it workedā€¦ too bad i ran out of credits so fast while people where asking questions and testing their capabilitiesā€¦ many people where amazed by the power of AIā€¦ heck testing was a lot of fun while it lastedā€¦ be carefulā€¦ thereā€™s no warning or indication how many tokens you are spending while testingā€¦

2 Likes

Does anyone have any idea how to solve this?

I just tell the assistant exactly what I want by appending more instructions:

{ role: ā€œuserā€, content: ${instruction}. Only respond with code as plain text without code block syntax around it. },

1 Like

Here is someone who actually does this Chat-Bot-using-gpt-3.5-turbo/index.html at main Ā· mnick-yt/Chat-Bot-using-gpt-3.5-turbo Ā· GitHub Line 88-97

When Chatml Markdown becomes publicly avaiable and if/for your mode it might be a solution to your problem.

GitHub has a variation of the URL that can select specific lines,
e.g. (#L<start>-L<end>),
i.e.

I created a chatGPT clone and I used this to extract the code from text recieve from openAI API :

type Code = {
  key: string;
  code: string;
}
  let codes: Code[] = []; ;
        if (answer) {
          const textCode = answer?.match(/```([\s\S]+?)```/g);
          if(textCode && textCode?.length > 0){
            codes = textCode
              ?.join(" ")
              .split("```")
              .map((code) => code.trim())
              .filter((code) => code != "").map(c => ({"key": c.slice(0,c.indexOf('\n')), "code": c.slice(c.indexOf('\n'))}));
          }
          console.log(codes)
        }

the answer is the text you got from API , the nI store code in an array of Code[] where key is the code language and code is the text code itself.
Happy hacking :wink:

1 Like

This was good, thanks. I modified it slightly to return the original response if it didnā€™t match, and I assumed the same language being returned for all code blocks and combined them into a single string with some formatting. I minimally tested it with responses that didnā€™t have markdown to strip out, as well as with those that have markdown and extra chat text. In case anyone wants it.

function parseCode(response: string) {
  type Code = {
    key: string;
    code: string;
  };
  let codes: Code[] = [];
  const textCode = response.match(/```([\s\S]+?)```/g);
  if (!textCode)
    return response;
  
  if (textCode && textCode?.length > 0) {
    codes = textCode
      .join(" ")
      .split("```")
      .map((code) => code.trim())
      .filter((code) => code != "").map(c => ({"key": c.slice(0,c.indexOf('\n')), "code": c.slice(c.indexOf('\n'))}));
  }
  
  return codes.map(code => code.code).join('\n\n').trim();
}

Couple of things to be careful of when using a simple regex for markdown removal:

  1. code segments can also be ā€œinlineā€ denoted by a single `
  2. ``` could be part of the code block, i.e. the code uses ``` internally.

to handle those you could modify the code :

function parseCode(response: string) {
  type Code = {
    key: string;
    code: string;
  };

  const codes: Code[] = [];
  
  while (true) {
    const start = response.indexOf('```');
    const end = response.lastIndexOf('```');

    if (start === -1 || end === -1 || start >= end) {
      // No more blocks to find or incorrectly nested backticks
      break;
    }

    // Extract the block between the backticks
    const block = response.slice(start + 3, end).trim();
    const newlineIndex = block.indexOf('\n');
    
    if (newlineIndex !== -1) {
      codes.push({
        key: block.slice(0, newlineIndex),
        code: block.slice(newlineIndex + 1)
      });
    } else {
      codes.push({
        key: block,
        code: ''
      });
    }

    // Remove this block from the response for subsequent iterations
    response = response.slice(0, start) + response.slice(end + 3);
  }

  return codes.map(code => code.code).join('\n\n').trim();
}

Yeah good catch on the corner cases. Forgetting inline for a second, both solutions actually have bugs. The regex version fails with extra ā€˜```ā€™ in the code. The loop version fails if there are multiple code blocks. For example:

parseCode is the regex impl
parseCode2 is the loop impl

const testOneBlock = "Sure! I'm a helpful chatbot.\n```typescript\nconsole.log('hello world');\n```";
const testOneBlockWithExtraTicks = "Sure! I'm a helpful chatbot.\n```typescript\nconsole.log('hello ```world');\n```";
const testTwoBlocks = "Sure! I'm a helpful chatbot.\n```typescript\nconsole.log('hello world');\n```\n\nI'm still really helpful.\n```typescript\nconst x = 'yo';\nconsole.log(x);\n```\n\nMore unhelpful chat";

(async function() {
  console.log(`parseCode(testOneBlock): \n${parseCode(testOneBlock)}`);
  console.log(`parseCode(testOneBlockWithExtraTicks): \n${parseCode(testOneBlockWithExtraTicks)}`);
  console.log(`parseCode(testTwoBlocks): \n${parseCode(testTwoBlocks)}`);
  console.log(`parseCode2(testOneBlock): \n${parseCode2(testOneBlock)}`);
  console.log(`parseCode2(testOneBlockWithExtraTicks): \n${parseCode2(testOneBlockWithExtraTicks)}`);
  console.log(`parseCode2(testTwoBlocks): \n${parseCode2(testTwoBlocks)}`);
})()

Output:

parseCode(testOneBlock): 
console.log('hello world');
parseCode(testOneBlockWithExtraTicks): 
console.log('hello
parseCode(testTwoBlocks): 
console.log('hello world');


const x = 'yo';
console.log(x);
parseCode2(testOneBlock): 
console.log('hello world');
parseCode2(testOneBlockWithExtraTicks): 
console.log('hello ```world');
parseCode2(testTwoBlocks): 
console.log('hello world');
\`\`\`

I'm still really helpful.
\`\`\`typescript
const x = 'yo';
console.log(x);

This seems to work for these test cases tho:

export function parseCode3(code: string) {
  if (!code.match(/```([\s\S]+?)```/g))
    return code;

  const filteredLines: string[] = [];
  let inCodeBlock = false;
  for (let line of code.split('\n')) {
    if (line.startsWith('```')) {
      inCodeBlock = !inCodeBlock;
      if (!inCodeBlock)
        filteredLines.push('\n');
        
      continue;
    }

    if (inCodeBlock)
      filteredLines.push(line);
  }

  return filteredLines.join('\n');
}

And just for amusement given why weā€™re all here. This is the best I was able to get gpt-3.5-turbo to do in solving this parsing problem. Didnā€™t really work, but had the right considerations.

export function parseMarkdownCodeBlocks(markdownText: string): string {
  const codeBlockRegex = /```(.*?)\s*([\s\S]+?)```/g;
  const codeBlocks: { language: string | null; code: string }[] = [];

  let match: RegExpExecArray | null;
  let lastIndex = 0;

  while ((match = codeBlockRegex.exec(markdownText)) !== null) {
    // Capture everything before the code block
    const beforeCodeBlock = markdownText.substring(lastIndex, match.index);

    // Capture the language name (if specified)
    const languageName = match[1].trim().toLowerCase(); // Convert to lowercase

    // Exclude code blocks with language names containing "example"
    if (!languageName.includes('example')) {
      // Capture the code block content
      const codeBlockContent = match[2];

      // Exclude triple backticks within a string inside the code
      const codeBlockWithTripleBackticksFixed = codeBlockContent.replace(/```/g, '``\`'); // Replace ``` with ```

      // Exclude the language name from the code block
      const languageExcludedCodeBlock = {
        language: null,
        code: `${languageName ? '```' + languageName + '\n' : ''}${codeBlockWithTripleBackticksFixed}`,
      };

      codeBlocks.push(languageExcludedCodeBlock);
    }

    // Update the lastIndex to the end of the match
    lastIndex = match.index + match[0].length;
  }

  // Join the code blocks with two newlines
  const codeBlockString = codeBlocks
    .map((block) => block.code)
    .join('\n\n');

  return codeBlockString;
}