Finetuning GPT 3.5 to generate patent claims by taking the entire patent document "except" claims as input. How to handle for token limits?

So here’s the problem i’d like to solve. I pick a random patent from google patents, and give it to chatgpt, then ask it to generate claims. It does a bad job. So i thought i can fine tune it to do so. But, the input context is so big, it will never see the whole context, and in most cases the output required.

I need advise on how to design my instruction dataset for this usecase, and how to handle for the token limitation.

Well, I had a similar problem but I don’t think I was getting as many tokens as patents lol. The best I can tell you is to perform some data preprocessing. Try to use natural language processing to eliminate unnecessary information and only pass the information that is crucial for your purposes. Also, you can use the tiktoken library to verify the tokens in your queries