AI That Can Truly Learn and Retain My Codebase

Hey everyone,

I’m on a mission to automate coding with AI, not just as a copilot but as a full-fledged developer that understands my projects like a real teammate. My goal is to train an AI model on my Laravel PHP codebase so that I can ask it to implement features, refactor existing code, and maintain consistency—just like a human developer would.

I initially tried Cursor AI, hoping I could train it on my coding style and architecture. However, I hit a major roadblock: it doesn’t retain knowledge between sessions. Every time I restart, it forgets everything I taught it, which makes long-term learning impossible.

Now, I’m exploring self-hosted models like StarCoder2, Code Llama, or DeepSeek Coder, but I need a setup where:

  1. The AI can persistently learn my codebase over time
  2. It can understand and follow my coding patterns
  3. I can query it for feature development and get cohesive, structured outputs

I’ve already started converting my Laravel code into JSONL format to train a model, but I’d love to hear from the community:

  • Has anyone successfully trained an AI to retain project-specific knowledge over time?
  • Which model would best suit this kind of long-term learning?
  • Any advice on fine-tuning these models efficiently?

Would love to hear your insights! :rocket:

1 Like

You’re facing a fundamental design limitation of current AI models — they’re essentially “stateless” between sessions unless you explicitly retrain them. Here’s why:

  1. Transformer models (like StarCoder2, Code Llama, and DeepSeek Coder) are trained on fixed datasets. Fine-tuning lets you adapt them to your codebase, but once trained, they can’t dynamically update or remember new information between sessions without additional fine-tuning.
  2. RAG (Retrieval-Augmented Generation) can simulate long-term memory by storing code and patterns in a vector database (like Pinecone, Weaviate, or Qdrant) and feeding it to the model at runtime. However, this still requires setting up a persistent external memory — the model itself isn’t “learning.”
  3. Ideal Solution? You’d need a hybrid system:
  • A self-hosted model (like Code Llama or StarCoder2) for inference.
  • A vector database or fine-tuning pipeline to store your coding patterns and context.
  • A retrieval layer to inject context dynamically during inference.
  1. Challenges:
  • Fine-tuning StarCoder2 on your JSONL data is possible, but it’s expensive and time-consuming.
  • A vector database approach is more scalable but requires thoughtful context window management (e.g., chunking your code to avoid token limits).

Practical Advice:

  • Start with a RAG setup — store your JSONL-converted codebase in a vector database and use embeddings to provide context during queries.
  • Fine-tune only if you hit a ceiling with RAG performance.
  • StarCoder2 is probably the best option for coding-based fine-tuning due to its training focus on code.

Persistent learning isn’t really solved yet — but a hybrid RAG + fine-tuning approach is your best shot with current tech.

1 Like

Hey ,

Thanks a lot for the detailed breakdown, really appreciated! I’ve started working with Cursor by defining rules for coding standards, adding unit tests, and keeping the code well-documented. So far, I haven’t used the larger models like StarCoder or LLaMA, Cursor’s been handling things fine for now.

That said, I’m running into a challenge:

While I’ve set coding standards through rules, I’m unsure how to define database relations and key columns effectively. My database structure is a bit inconsistent, so I want to make sure Cursor consistently understands and applies the correct relationships and critical fields.

Any suggestions on how to:

  1. Embed database schema knowledge (like table relationships and important columns) into the model’s rules?
  2. Handle this efficiently, would a RAG setup be too much, or is there a simpler approach?

I’m open to exploring new methods if it means improving consistency. Again, appreciate the insights, looking forward to hearing your thoughts!