Issues and training when updating the LLM model on a project

dgovergun · June 12, 2024, 8:52pm

We are building an AI tool that allows querying data from an SQL database. Langchain is used as the foundation for training models, maintaining chat history, and integrating with different LLM providers/APIs.

The main idea is based on the user writing a natural language question about the data, considering dates and other types of filters. Using an LLM, the goal is to have the model understand the user’s intent and construct the corresponding SQL query to obtain results and return a response.

The LLM training instructions were designed to specify which type of database the query should be built for, what date to consider based on the user’s input, the database structure, what limits should be applied to the queries, the default currency for monetary calculations, etc. All instructions aim to assist in designing the corresponding SQL query.

That being said, the tests were successfully conducted on Azure OpenAI, using the “GPT-4-turbo-1106-preview” model. This model is about to be deprecated/replaced by “GPT-4-turbo-2204-04-09” and, at the same time, Azure has made “GPT-4o”, the new OpenAI model, available.

When testing with “GPT-4-turbo-2204-04-09” and “GPT-4o”, everything that had previously worked stopped functioning. Incoherent responses, poor comprehension of instructions, or simply disregarded instructions. Invalid queries, incorrect results, etc.

We would expect that automatic model updates (such as the case with “GPT-4-Turbo”) on Azure to be more stable, given the short duration of each version, which is around 6 months.

So…

Is it normal that changing from one OpenAI model to another results in such a significant difference in comprehension and outcomes?
Does any model change necessarily mean the entire set of instructions needs to be refactored?
Should we be doing things differently?
Are OpenAI models suitable for building such a tool?

anon10827405 · June 12, 2024, 10:48pm

Yes, but it’s usually not worse

Yes

Probably. You have used a bunch of bootstrapped black boxes together and now are going to have a hell of a time debugging it

Yes

dgovergun · June 13, 2024, 12:56pm

So, the issue is in the training prompts?

Should we avoid using any library and code everything using just the API?

anon10827405 · June 13, 2024, 4:24pm

In my opinion, with technology advancing rapidly, it’s essential to have a clear, simple, grounded, and understandable process. This ensures that when something goes wrong, like a prompt not working, you can easily identify and address the root cause.

Libraries like LangChain often abstract away too much of the fundamentals behind RAG, making them difficult to modify & fully understand.

It’s not essential but maybe just a thought down the line. It would most likely lead to cheaper, faster, more maintainable code

Topic		Replies	Views
Keeping Assistants in a Box API assistants-api	22	203	February 5, 2025
My biggest pain point with GPT coding: outdated libraries, apis, docs Prompting gpt-4	9	657	February 28, 2025
Custom GPTs cannot even retrieve information from its custom knowledge? GPT builders	11	1308	February 27, 2025
Has regular gpt-4 model changed for the worse by any chance? Community gpt-4 , hallucinations	12	1807	April 23, 2025
Prompt Regression Testing - API Usage Prompting api , prompt-engineering	10	407	February 14, 2025

Issues and training when updating the LLM model on a project

Related topics