What is the current best option for providing documentation to LLMs so they can use an open-source library?

,

There is an open-source library I am using that has a few .rst (Sphinx documentation) files in addition to clean code. I am interested in providing the package to LLMs—either/or providing just the .rst meant for humans or the .py files themselves.

What are the best open-source options to provide either a small codebase or human-readable documentation to LLMs so they can use the tool?

What do you mean by “open-source” - this sounds more like a 20-30 lines of code python script…

something like this:

#!/usr/bin/env python
import argparse
from pathlib import Path

def gather_files(root, ext):
    return list(Path(root).rglob(f'*{ext}'))

def main():
    parser = argparse.ArgumentParser(description="Aggregate .py or .rst files for LLM ingestion")
    parser.add_argument("root", help="Root directory of the package")
    parser.add_argument("mode", choices=["code", "doc"], help='Choose "code" for .py or "doc" for .rst files')
    parser.add_argument("--outfile", default="llm_input.txt", help="Output file to aggregate content")
    args = parser.parse_args()

    ext = ".py" if args.mode == "code" else ".rst"
    files = gather_files(args.root, ext)
    
    with open(args.outfile, "w", encoding="utf-8") as out:
        for file in files:
            out.write(f"=== {file} ===\n")
            out.write(file.read_text(encoding="utf-8"))
            out.write("\n\n")
    print(f"Aggregated {len(files)} {ext} files into {args.outfile}")

if __name__ == "__main__":
    main()

You could also use a shellscript or you run a script to analyse them all and make a small summary on what they are useful for and feed this combined summary on every request like so:

hey bot… you could use stuff from this list:

[list]

Or you go full overkill and use LangChain or Llamaindex… or do a combination of a neo4j graphdb with many analyzers that feed the information graph from the rst files and the code together with embeddings.

And then maybe train a GNN from the graph :smile:

The problem is that the full python files and all RST files can’t fit into a context window so that’s what I was thinking of using RAG. And code is a little different from plaintext. But yeah maybe the solve here is just thoughtful chunking of the code so that LLM can call for a particular piece of information which then feeds in relevant documentation etc?

So a single RST doesn’t fit into o3-mini context of what? 200k?

Let it give you a summary and a file location and then give that to the model and tell it it can get the full description with a tool call…

Like so:

hey bot you can read full rst descriptions when using the rst reader tool. Here is a short summary of each file:

1.rst - contains x…
2.rst - contains y…
2.rst - contains z…

Or tell o3 to give you a summary of a rst but only every second token haha

@edwinarbus wouldn’t that also be an idea to summarize older chat messages so the chat’s get better memory?

To efficiently manage long conversations within token limits, chat messages can be stored in a fast-access database like Redis, keeping both the full version and a compressed summary using an nth-token approach. On each follow-up request, the system sends only the summarized past messages to the model, preserving context while reducing token usage. If the user asks for specific details, the system can retrieve and send the full, original message that best matches the query using semantic search or keyword matching on the original text (or use embeddings - just to select the best matching full context that needs to be send). This approach ensures efficient memory management while still allowing access to precise information when needed.

Or here is a compressed version

To manage long within limits, messages be in fast-access like keeping full and compressed using nth-token or summarization On follow-up, system only summarized messages model, context reducing usage. user for details, system retrieve send full, message best query semantic keyword This efficient management allowing precise needed.

Or give this to the devs:

tokens = tokens[::2]

which could be used using tiktoken and potentially reduce the api costs by 50% :smiley: