i have been experimenting with two bots Marv and Fleur, both different characters. Fleur loves to beautify her responses for clarity with html and she prefers to use bulletpoints in her explanations and puts them on new lines for extra clarity and loves bold headingsā¦ while Marv doesnāt beautify his normal responses but does always wrap his script examples within the code blocksā¦ i put the bot in telegram groupsā¦ marv was more popular with the nerds in a linux groupā¦ fleur was more populair in a more spiritual groupā¦ it was fun to experiment for 3 days with having a cranky sarcastic bot and a spiritual happy bot next to each other that both have different ways to present the requested informationā¦ it workedā¦ too bad i ran out of credits so fast while people where asking questions and testing their capabilitiesā¦ many people where amazed by the power of AIā¦ heck testing was a lot of fun while it lastedā¦ be carefulā¦ thereās no warning or indication how many tokens you are spending while testingā¦
the answer is the text you got from API , the nI store code in an array of Code[] where key is the code language and code is the text code itself.
Happy hacking
This was good, thanks. I modified it slightly to return the original response if it didnāt match, and I assumed the same language being returned for all code blocks and combined them into a single string with some formatting. I minimally tested it with responses that didnāt have markdown to strip out, as well as with those that have markdown and extra chat text. In case anyone wants it.
Yeah good catch on the corner cases. Forgetting inline for a second, both solutions actually have bugs. The regex version fails with extra ā```ā in the code. The loop version fails if there are multiple code blocks. For example:
parseCode is the regex impl
parseCode2 is the loop impl
export function parseCode3(code: string) {
if (!code.match(/```([\s\S]+?)```/g))
return code;
const filteredLines: string[] = [];
let inCodeBlock = false;
for (let line of code.split('\n')) {
if (line.startsWith('```')) {
inCodeBlock = !inCodeBlock;
if (!inCodeBlock)
filteredLines.push('\n');
continue;
}
if (inCodeBlock)
filteredLines.push(line);
}
return filteredLines.join('\n');
}
And just for amusement given why weāre all here. This is the best I was able to get gpt-3.5-turbo to do in solving this parsing problem. Didnāt really work, but had the right considerations.
export function parseMarkdownCodeBlocks(markdownText: string): string {
const codeBlockRegex = /```(.*?)\s*([\s\S]+?)```/g;
const codeBlocks: { language: string | null; code: string }[] = [];
let match: RegExpExecArray | null;
let lastIndex = 0;
while ((match = codeBlockRegex.exec(markdownText)) !== null) {
// Capture everything before the code block
const beforeCodeBlock = markdownText.substring(lastIndex, match.index);
// Capture the language name (if specified)
const languageName = match[1].trim().toLowerCase(); // Convert to lowercase
// Exclude code blocks with language names containing "example"
if (!languageName.includes('example')) {
// Capture the code block content
const codeBlockContent = match[2];
// Exclude triple backticks within a string inside the code
const codeBlockWithTripleBackticksFixed = codeBlockContent.replace(/```/g, '``\`'); // Replace ``` with ```
// Exclude the language name from the code block
const languageExcludedCodeBlock = {
language: null,
code: `${languageName ? '```' + languageName + '\n' : ''}${codeBlockWithTripleBackticksFixed}`,
};
codeBlocks.push(languageExcludedCodeBlock);
}
// Update the lastIndex to the end of the match
lastIndex = match.index + match[0].length;
}
// Join the code blocks with two newlines
const codeBlockString = codeBlocks
.map((block) => block.code)
.join('\n\n');
return codeBlockString;
}