Counting author citations and so on

I’m finalizing my thesis and using GPT-4o to help count citations and gain insights from the literature review. I’ve created a specialized GPT on GPT Builder tailored to my research needs.

However, I’ve noticed that whenever I ask GPT to count which author has the most citations, it returns a significantly lower count than the actual number.

For example, when I use the prompt:
“Check the thesis document for the 10 most cited authors. Organize a table with these 10 authors and include a column with the number of times the author was cited.”

GPT tells me an author has 17 citations, but when I count in Word, I find 33 occurrences—a big discrepancy.

Does anyone have suggestions for improving this GPT? Is there a known issue with GPT counting citations in Word documents?

Thanks

I’m new in the forum and not an expert in AI at all, but personally in order to count occurrences I’d use another approach, like regex search. I use linux Ubuntu, and in that OS it’s pretty straightforward to do it, for instance ask GPT:

Give me a regex search pattern to use in linux to count the number occurrences of this author: John Smith, which may have the middle name initial, John T. Smith, and the last name might be all in uppercases.

Which will return something like:

grep -ioP ‘\bJohn\s+([A-Z].\s+)?Smith\b’ filename.txt | wc -l

Advantages:

  • More precise

Disadvantages:

  • Takes more time

I hope it helps.

Thanks for the suggestion. I use MacOS. I think I can use a similar command line using perl for this approach.

Thank you

1 Like

The problem with this approach is that I have to know the author name before to use the command line. I have more than 200 papers/authors on my literature review. I can’t test it one by one.

I want to know who are the top 10 authors on the Word document.

I haven’t heard PERL in a loooong time.

“Pretty Eclectic Rubbish Lister” for the win! :wink:

More on topic, LLMs still find it hard to count words or even lists. One hack is to ask it to start with 10)… it’ll usually stop at 1) giving you your ten. Then you just need to flip them (if needed)…

Let us know if it helps.

1 Like

How about having the model extract the names and putting them into a list via code interpreter one by one?
Add additional data you are interested in and make it a table.
The trick is to go one step at a time so the model doesn’t lose count.

2 Likes