Text-davinci-003 model half cooked answer

Hi I am using text-davinci-003 model but it is giving me half cooked answer like the end lines are cut in the response, while using gpt-3.5-turbo giving me the full response.

Also text-davinci-003 model is giving me a lot of error 400 while giving the same query 3-4 times it gives the answers but 1st two times it gives error a lot on many queries while using gpt-3.5-turbo error 400 is eliminated.

Any solution for this?

Welcome to the forum!

Can you post the snippet of code that calls the API and any setup code that it relies on please.

import { OpenAI } from 'langchain/llms/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { ConversationalRetrievalQAChain } from 'langchain/chains';

const CONDENSE_PROMPT = `بالنظر إلى المحادثة التالية وسؤال المتابعة ، أعد صياغة سؤال المتابعة ليكون سؤالاً مستقلاً.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:`;

const QA_PROMPT = `أنت مساعد AI مفيد. استخدم أجزاء السياق التالية للإجابة على السؤال في النهاية.
إذا كنت لا تعرف الإجابة ، قل فقط أنك لا تعرف. لا تحاول اختلاق إجابة.
إذا لم يكن السؤال متعلقًا بالسياق ، فأجب بأدب أنك مضبوط للإجابة فقط على الأسئلة المتعلقة بالسياق.

{context}

Question: {question}
Helpful answer in markdown:`;

export const makeChain = (vectorstore: PineconeStore) => {
  const model = new OpenAI({
    temperature: 0.2, // increase temepreature to get more creative answers
    modelName: 'text-davinci-003', //'gpt-3.5-turbo-0301', //'text-davinci-003', //change this to gpt-4 if you have access
  });

  const chain = ConversationalRetrievalQAChain.fromLLM(
    model,
    vectorstore.asRetriever(),
    {
      qaTemplate: QA_PROMPT,
      questionGeneratorTemplate: CONDENSE_PROMPT,
      returnSourceDocuments: true, //The number of source documents returned is 4 by default
    },
  );
  return chain;
};

My only recommendation is that you use the 3.5-Turbo model, it’s quicker, cheaper and more powerful. May I ask why you would want davinchi-003 in this scenario?

Check the finish_reason and if shows length when the output is cut.

If yes, this means “Incomplete model output due to max_tokens parameter or token limit”.

1 Like

I have tried different models for checking the response accuracy as the accuracy of text-davinci-003 model is very good as it retrieve the respnse data from pdfs files I have ingested in pincone. Also used gpt-3.5-turbo but it gives very general response not that accurate or specific.
Is any flaw in text-davinci-003 model regarding error 400 & text limitation?

I have checked in terminal it is showing me the max-token-limit=256 as how to increase that to get full response text.

No, my guess is that you hit upon a temporary outage, you should always expect all API endpoints to give an error and build your system to gracefully handle them.

That is to say expect the worst and handle it in a graceful manner and then your system will perform well for any given scenario.