API gives different answers all the time

Hi Everyone,

I am trying the score a text on basis of Grammar, Structure, Content and Relevance. I have nade the fix set of rules which are pretty clear like if you do not find xyz section in the text then reduce 10 points. And similar for other attributes.

The problem is it scores most of the times differently. For example sometimes it is able to detect a certain section is available and sometimes it is not able to detect that. I am really confused as to what to do.

I tried with gpt 3.5 and 4

Questions:

  • Is every api call completely independent of any previous call? I do not want it to remember any history.
  • Should I look into fine tuning? Will it help? I have around 150 examples of texts to train the model. Is that enough with 3 epochs?
  • Any other solutions to achieve this?

Any help is greatly appreciated. Thanks!

Welcome to the forum!

Yes, it only receives what you send. Might make sure you’re sending your messages object correctly?

I would suggest trying a one-shot example in the system message (or even two-shot… ie two examples…) Or, you might try fine-tuning with your data, just make sure you set it up correctly for best results.

The old Book Genome Project (BookLamp bought by Apple and closed) did something similar, I believe… You might want to dig down that rabbit hole some… Old tech, but it worked really well… before Apple buried it…

1 Like

Thanks!
Can you elaborate more on fine tuning for my specific use case?
Here is what I understand about it.

  • I need to clean up the data(text/essays) in my case.
  • Then need to input each text with how I would have rated it based on various attributes that I mentioned.
  • And then start training.

Questions:

  • I am confused about do I need to send whole/complete text as one example or break it down into the attributes I want to train on? For example just send small text which is content score then another text for relevance score and so on.

  • Do I also provide the rubrics/rules that I have used to rate the inputed user text?

  • If the model does not improve after a couple epochs, how do I tell the model that you need to change this or that? Like how do I understand specifically what the model is not doing properly for a particular Text?