Streaming Markdown, Converting to HTML

I am getting streaming markdown from the Assistants API. The chunks arrive with lots of extraneous spaces in them, but I ultimately need to put the chunks together and then render the markdown to HTML. When I parse the markdown with markdown-it or marked the extra spaces cause the rendered HTML to have

tags where none should really exist. That makes the HTML look particularly ugly. I am not trying to render each chunk as it streams – I am just trying to render the concatenated result.

Here’s an example of the markdown that I receive. Notice the spaces between Friday, August 19. Any suggestions as to how I should approach this problem? Perhaps I am missing something more fundamental?

### Movie Night Information

- The final Movie Night of the summer will feature a **Frozen Sing-A-Long**.
- **Date**: Friday,

 August

19
- **Location**: El Quito Park (12855 Paseo Presada)
- **Event Timing**:

 - Begins at **7:30 p.m.

** with activities and resource tables.

- The movie will start at **sundown**.

Please come early to find a good spot and enjoy the event【4:0†source】【4:8†source】【4:9†source】.

Usually we use markdown library to display the code in the UI, you don’t want to do that?

1 Like

There might be some problems occurring during the streaming process.
Have you tried the method without streaming to see if the results are the same?

That’s what I am doing at the end of the stream. But the final, accumulated markdown is not clean. It has extraneous spaces in it.

Hi,
here an exemple of convert markdown to html :slight_smile:
see github
/vincmag360
/CSHARP-WinForm-MISTRAL-AI-TEST-LLM

You will need to to appends all streaming chunk together, then parse the entire md response in html used marked js or relevant library based on ur used language

The goal is actually to have a stream processor that can act on the various markdown containers and state of them as data is being received, releasing any “in the clear” enclosed production as soon as it is determinate, while holding back things like code fenced blocks or giving trust that they will be closed once the intention is understood.

That is hard to do for complete CommonMark compliance, but I’ve been occasionally plugging away at the same, focusing on all the edge cases I can make an AI produce.

Maybe somebody finds it useful, but here’s an another attempt to create a markdown stream parser (NPM package)

github https://github.com/Lixpi/markdown-stream-parser

Also posted it here Another attempt to parse Markdown stream in real time

You can preview it on demo page here

Please keep in mind that it’s on a very early stage, but I find it quite useful for my own purposes.
Also please scroll the the end of the readme on github page, the you can find some information about potentially more advanced techniques.

Any criticism, feedback, suggestions etc are absolutely welcome!

1 Like