File content in JSON response truncated after ":" in URLs

I am requesting the content for a series of files to be given in JSON format. Below is an abbreviated version of the response.

{
    "files": {
        "main.py": "{complete file content included}",
        ...
        "README.md": "# Personal Library API\n\nA RESTful API for managing your personal library, built with Flask and SQLite.\n\n## Features\n- CRUD for books and authors\n- Track reading status, progress, and dates\n- Search, filter, and sort books\n- Pagination for all list endpoints\n- Input validation and error handling\n- Automated tests\n\n## Setup\n\n1. **Install dependencies:**\n\n    ```sh\n    pip install -r requirements.txt\n    ```\n\n2. **Set environment variable:**\n\n    ```sh\n    export DATABASE_URL=\"sqlite:
        ...
        "src/models.py": "{complete file content included}",
        ...
    }
}

I am familiar with “URL mangling” but this seems a little different. Notice that for “README.md” is breaks right after the “sqlite:”, presumably on the double slash “//”, and never closes the quote. It then continues on to complete the next element successfully. If the LLM doesn’t happen to include a URL it works perfectly fine. It only breaks if it does include a URL.
I am generous with token limit so that isn’t the cause and, even if it was, it would truncate at the token limit, not somewhere in the middle, then continue on.

I also include language on escaping URLs:


            Return a raw JSON object with the following structure, without markdown formatting or comments. All URLs, including HTTP, HTTPS, and database protocols, must be properly escaped — meaning:
            Enclose URLs in double quotes.
            Escape backslashes with \\.
            Escape forward slashes with \/.
            Escape double quotes with \".
            Percent-encode reserved characters if necessary.
            Ensure the JSON is complete, valid, and well-formed.
            Example:
            {{
                "website": "http:\/\/example.com",
                "database": "sqlite:\/\/\/path\/to\/db"
            }}

The end result is JSON parsing fails because the element isn’t well-formed.

Anyone seen this behavior? Any strategies for getting the LLM to complete it?

There is no need to escape forward slashes in JSON. Forcing the AI to produce such a pattern might make the API output handler get confused or something.

That’s the first thing I’d try.

Then try it with logprobs enabled, and examine the compliment of top-20 tokens at that position and see if producing the correct thing was indicated, was a lower option, or if the AI just went completely wrong with its token output.

Completely agree, there shouldn’t be any need to escape forward slashes. The only reason I tried that was because it was breaking on what I am guessing is a “//” that followed the “:”, because identical behavior occurs with “http:”.

Good callout on logprobs. I’ll try that.

First of all, brilliant! logprobs showed me that it was indeed choosing “//” as the start of the next token, which caused me to look closer at my code.
Not so brilliant: The root cause was an obsolete function call I didn’t think I was doing anymore, and that was truncating the file data in the JSON. Mystery solved, thanks to your suggestion. :handshake:

1 Like