Most suitable model to parse scientific references/citations

I am using “gpt-3.5-turbo” model to parse scientific references to extract authors, title, journal, volume, issue, pages and doi. The results are quite accurate however it is very slow and I was wondering if I manage the same task with any other model?

ref = “”“5. Clarysse, M., Wilmer, P.A., Debaveye, Y., Laleman, W., Devos, T., Canovai, E., Verslype, C., Van der Merwe, S., van Malenstein, H., Nevens, F., Maleux, G., Sainz Barriga, M., Monbaliu, D., Pirenne, J. (2021). Mixed Physiological and Non-physiological Portal Inflow as an Alternative to Multivisceral Transplantation in a Patient with Diffuse Splanchnic Thrombosis. In: Transplantation: vol. 105 (7S), (Abstract No. P-84), (S93-S93). Presented at the 17th Congress of the Intestinal Rehabilitation & Transplant Association, Auckland, New Zealand, 30 Jun 2021-02 Jul 2021. doi: 10.1097/01.tp.0000758140.57972.21"”"

prompt = (“”“parse the given reference and extract authors, title, journal, proceeding, publication year, volume, issue,
pages and doi. present them in json format. here is the reference: “”” + ref_db)

response = openai.ChatCompletion.create(model=model_engine,
messages=[{“role”: “user”, “content”: prompt}],
max_tokens=1000)

1 Like

I think this is a good idea - because the “cheaper” models are also faster with classification-tasks such as yours. From my experience what you are looking for is between babbage and curie. I’d advise to start of with babbage first :slight_smile:

image

2 Likes

Lovely! From a few trials, babbage seems to give more complete results:

response = openai.Completion.create(
engine=“text-babbage-001”,
prompt=f"Extract the title, authors, and journal name, proceeding, publication year, volume, issue, pages and doi from the following reference:\n{ref}\nTitle:",
max_tokens=1024,
n=1,
stop=None,
temperature=0.7,
)

1 Like