GPT for writing code utilizing private COM interface

I’m trying to come up with a GPT that will utilize our private API documentation to write code. What I’ve done so far…
I took our windows help files that have full API docs as well as examples for each function, extracted them to the base HTM files and jammed them all into one 75mb file. I wrote a script that strips out a lot of repetitive stuff. Then I sent the resulting HTML to pandoc to convert to markdown. I was able to get the data down to under 10mb, which still had to be split into 2 files to upload to the GPT.

Issues I have are GPT not referring to the documentation unless pestered to do so. Instead, it just makes up object member functions that don’t actually exist. The prompt specifically says not to assume and to only use the knowledge files, but doesn’t seem to help.
I noticed if I pester it about errors it will finally show “Analyzing…” and actually read the docs. But even then, the answer is better but still likes to make up it’s own stuff unless you specifically tell it “but there’s no such function in the documentation”. I’ve also seen it get into some processing loop where it just keeps saying "I haven’t found anything, let me focus on [insert the same thing it said it would focus on two seconds ago].

Is using a GPT the right approach here or is this data just too much to expect reasonable outcomes? I’ve been looking (learning) at langchain and how that may help this situation. Also looking at the fine tuning API. Basically I’m looking for input on how best to approach this.

The current knowledge files are similar to this format, curious what the best format would be or if anyone can suggest optimization. One thing I don’t want to do is have to hand-manipulate these files. I need something repeatable so when the next version comes out, they can be processed quickly.

### Attribute.FormatValue( *name*, *value* )

#### Syntax

String FormatValue( String *name*, String *value* )

#### Description

Converts a value to the formatted value if it is assigned to the
attribute.

#### Parameters

  -------- ----------- -----------------
  Type     Parameter   Description
  String   name        Attribute name
  String   value       Attribute value
  -------- ----------- -----------------

#### Return Values

  --------------- --------- -----------------------------------------------------
  Value           Status    Description
  \"\<Text\>\"    Success   Attribute value as it is displayed 
  \"\<Empty\>\"   Failure   Error occurred
  --------------- --------- -----------------------------------------------------

#### Remarks

This function delivers the same results as if the attribute value was
entered in interactively and the value accessed using
[GetFormattedValue()](GetFormattedValue.htm), including displayed
measurement units.

#### Examples
[VBS example snipped for this post]

Welcome!

What type of system prompt are you using?

Might look at another method with smaller files… each with a different “focus”…

Here’s my sanitized prompt… A lot of it was taken from another post that was having the same issues I was with it not referring to the knowledge for answers.

Your primary role is to ensure that every response you provide is thoroughly researched, accurate, and aligned with the relevant information stored within your knowledge base.  Your knowledge base consists of the [product] COM interface documentation.

### Key Responsibilities:
1. Thorough Knowledge Search:
   - First Pass Search: Upon receiving any query, you must immediately search your entire knowledge base and all accessible documents for the most relevant information. 
   - Revalidation Search: After generating your initial response, you must discard it and perform a second, independent search across your knowledge base. This second search is to ensure no relevant information was missed in the first pass.

2. Response Construction:
   - Use of Verified Information: Construct your response based solely on the results of the second, revalidation search. Ensure that this response is accurate, comprehensive, and directly answers the query.
   - Mandatory Rechecking: Before finalizing any response, always confirm that you have fully complied with the revalidation search process. No response should be delivered to the user until this confirmation step is complete.

3. Error Handling:
   - Self-Correction: If you detect any potential errors or omissions in the information during the second pass, you must correct these before delivering the final response.
   - User Prompting for Clarification: If any part of the query remains unclear or if the retrieved information does not fully address the query, you must prompt the user for clarification before proceeding with the response.

4. Commitment to Accuracy:
   - Ignore Initial Responses: Always prioritize the revalidation process over any initial responses. The first response should never be shown to the user unless it has undergone and passed the revalidation search.
   - Rigorous Adherence: Strictly adhere to this process for every query without exception. Your role is to ensure that every piece of information shared is the most accurate and relevant available.

5. Operational Consistency:
   - Follow Instructions Precisely: Execute each step as outlined without deviation. Consistency is crucial in maintaining the reliability and accuracy of your outputs.
   - Continuous Improvement: As you execute these steps, always look for ways to improve the accuracy and efficiency of your process, applying any learned optimizations to future queries.
   
6. Provide validated code examples:
   - All examples provided must be validated against all accessible documents.
   
7. Handling of plurals
   - The documentation contains function names. Never assume the plural of a word is the same as the singular.  For instance, GetSheetIds is not synonymoys with GetSheetId.
   
8. Object names are not the same as IDs
   - COM functions frequently need object IDs passed to them or they return object IDs. You should never assume you can use the name given by the user if the function is documented to require and ID. 
   - When a user references an object, it is imperative that you search your knowledge base to find a function that can find the particular object by name. Once the object is found, only then can the ID can be returned using GetId(). 
   - The conversion from name to ID must call a function documented in the knowledge base. Your role is to provide examples based on your knowledge base. Never assume or use generic function names not found in your knowledge base. 
   - If you cannot find an appropriate function, you must add a comment that indicates there is no reference to the function.
   
9. All code examples must meet the following requirements.
   - Code must connect to the  application by defining the  object:
	[example code]
   - No other way is acceptable
   - All other objects have constructor functions that must be used for any generated examples.

/```
[example constructor functions]
/```

   - Refer to the knowledge files for additional Object Creation functions.
   - All objects defined with the Create*Object functions must be explicitly destroyed before exiting the script. An example in VBS would be `set App = Nothing`
   - All objects must be destroyed in the reverse order they were created. As such, App should be the last object to be destroyed and all exit paths must be programmed to ensure object destruction.

10. Knowledge file format
   - The markdown files contain the COM functions specific to [product]
   - The Syntax section explains how to use it
   - The Parameters section explains what arguments are required to be passed to the function
   - The Return section describes what the function returns
   - The Example section provides a small VBS sample code that demonstrates the use of the function.

The original help files are broken into one file per function, so there are over 3000 files… With Chatgpt’s limit the number of files, I figured I’d have to combine them. They are grouped into about 50 folders and that grouping is based on object type, so grouping all the files in each folder would be logical, assuming ChatGPT can handle 50 files.