Maintaining context in code completions

What are some sophisticated solutions to maintain large prompt context for code completion?

For example, right now I query the model with a task (e.g. make a plot of this dataframe), and all the remaining tokens without the query would be used to supply the context of the previously written code. If the token limit is reached, the code is simply truncated.

What is a better way to do compress or summarize the code such that we could maintain more information in the context that is relevant to the question? Would embeddings work here given code snippets? In that case, how would you chunk and query the code?

Cheers!

  1. Instead of giving the entire function just give the API or header.
  2. Only give the details of one specific function that needs to be changed. In other words think of the problem like a tree and only work on the leaves. There are more heuristics that I follow but are hard to put into words and would only be confusing so leaving them out until they are more easily conveyed.

I have not used embeddings so can’t respond. So far for what I expect an LLM such as ChatGPT to be capable of doing I am able to get it to work with better prompts.


Here is actual prompt in the ball park of your question that generates correct code including build steps and asks for a demonstration, notice the level of prompt engineering needed. The prompt took about 10 or so tries to get it working correctly but learned quite a bit in creating it.

Click triangle to expand
Create an SWI-Prolog foreign language example for MSYS2, details follow:
  * Working directory: '/home/Groot/Example' 

For the C code
  * Create a file 'capstone_pl.c'
  * There is a  single C function
    a. Signature: 'static foreign_t cs_version_pl(term_t major, term_t minor)'
    b. Uses Capstone function: 
      'unsigned int CAPSTONE_API cs_version(int *major, int *minor)'
  * The code should be for 64-bit Windows
  * Include instructions to 
     a. Compile code using 'gcc'
     b. Link files into Windows DLL using 'gcc' 
  * Be specific about which MSYS terminals to use, e.g. MSYS2 MSYS or MYSY2 MINGW64
  * 'PL_fail' is a preprocessor directive, so 'return' not needed before 'PL_fail'
  * 'PL_fail' does not take any arguments.
  * Include 'PL_register_foreign'
  * Do not add conditional complication directives for '__cplusplus'
  * For setting the term_t value do not use 'PL_put_*' use 'PL_unifiy_*' 
  * 'PL_unifiy_*' set a return code, use the return code accordingly.
  * Do not include a 'main' function in the code. 

For the SWI-Prolog code
   * Create a file 'capstone.pl' 
   * Use 'use_foreign_library' as a Prolog directive.
   * Create a predicate to demonstrate using 'cs_version_pl'

For the compile step
  * Add '-I/usr/include/capstone  -I/mingw64/lib/swipl/include'

For the link step
  * Add '-L/usr/lib -lcapstone -L/mingw64/lib/swipl/bin/x64-win64 -lswipl'

Demonstrate running of code using SWI-Prolog with expected output.