Finetuning GPT 3.5 to generate patent claims by taking the entire patent document "except" claims as input. How to handle for token limits?

sachinsm2022 · February 23, 2024, 4:56am

Hi,
So here’s the problem i’d like to solve. I pick a random patent from google patents, and give it to chatgpt, then ask it to generate claims. It does a bad job. So i thought i can fine tune it to do so. But, the input context is so big, it will never see the whole context, and in most cases the output required.

I need advise on how to design my instruction dataset for this usecase, and how to handle for the token limitation.

gtlaracena · February 29, 2024, 7:55pm

Well, I had a similar problem but I don’t think I was getting as many tokens as patents lol. The best I can tell you is to perform some data preprocessing. Try to use natural language processing to eliminate unnecessary information and only pass the information that is crucial for your purposes. Also, you can use the tiktoken library to verify the tokens in your queries

Topic		Replies	Views
Large context document and finetuning API gpt-4 , fine-tuning	1	185	September 6, 2024
How to overcome OpenAI fine-tuning training data token limit? API api	5	2417	December 18, 2023
Facing challenges due to token limitations API	2	75	January 29, 2025
Is there a way to exceed the token limit and stay in context? API gpt-4	1	499	December 20, 2023
Working with GPT 3.5 Turbo to query JSON data - ChatGPT and Token Limits API	4	3261	May 17, 2023

Finetuning GPT 3.5 to generate patent claims by taking the entire patent document "except" claims as input. How to handle for token limits?

Related topics