Upload fails when txt or md contains non-printable

I lost about an hour tracking down a problem while trying to upload some markdown files for use in a vector store for my RAG assistant. Most files worked fine, but a few reported “unsupported_file” error. The error message was canned an unuseful. It took me quite a long time to discover that the problem was that these failing files included one non-printable unicode character (\u001f in my case) somewhere deep in the middle of the file.

It’s fair enough that you choose to reject this file (although that is debatable). But you could save people a lot of time if you provided better error messages. After all, you must have just checked to see that there was a non-printable in order to reject it. Why not provide a useful error message!

1 Like