What is the best way of getting OpenAI API to respond with more specific & statistical responses related to Financial Markets?

zainanwar6234 · November 23, 2024, 12:40pm

I am building an app related to Financial Markets and I want OpenAI API to respond with more specific knowledge of the events that have happened in financial markets and stating facts/prices of various stocks. I know for sure that we need to fine tune the base model to make it aware of the events happening.
For training, we have some news sources APIs that regularly fetch the latest news and events daily, and I have a lot of other training data as well which includes graphs(images), PDFs etc that I want to feed into ChatGPT for it to learn.
First of all my question is which approach is better in my usecase? Should I use the RAG approach? or the Fine-tuning approach? or may be a Hybrid approach that uses both?

Secondly, in the documentation I saw that fine-tuning can only be done using the JSONL files that have question answer pairs. Another thing that I have in mind is that, we will require a lot of manual effort to create these JSONL files ourselves and I was thinking of automating this task by using a Python Script to Create a JSONL file for me using the data that I provide. Is this a valid approach? Because a script is gonna ignore the semantics of the data in a file. Also there will not be some meaningful prompts and completion pairs when using a script, because obviously a script wouldn’t know what questions to put in the file.

Anyone had a similar experience? Please share your thoughts. Thanks

platypus · November 23, 2024, 1:28pm

Hi @zainanwar6234 and welcome to the community!

Since markets are dynamic, you want up-to-date knowledge, so finetuning is not the best approach. Also, the models inherently have a good grasp of macroeconomics, so they can respond appropriately, given the right data and context.

I would start with a simple approach where, for a given ticker, company, or sector, you simply fetch the latest (or historical) data from your data sources, and combine that data with a nice tight system prompt, to produce the necessary output. You are most likely able to do this in a single API call.

If you want more finesse, e.g. focusing on different aspects, like getting the alpha, derivative insights, sentiment, etc, I would just define functions with appropriate system prompts and data sources for each of those.

Hope that helps!

zainanwar6234 · November 23, 2024, 2:46pm

What is fine tuning then? And can we somehow increase the knowledge base of the model? I am talking about the specific model that we are using through our API key. I am not referring to RAG approach - whereby a system prompt is sent with information/article that is relevant to the prompt - rather I am wanting to know can we train the model just by feeding it information/articles that are relevant to our domain specific usecase.

platypus · November 23, 2024, 9:02pm

OpenAI provides a very good guide on fine tuning, and describes when you should use finetuning here. In essence, the recommendation is to really try the prompting strategies and problem breakdown as what I described previously.

The trouble with finetuning is that you are trying to add your custom knowledge to the model, which is guaranteed to be many orders of magnitude smaller than the knowledge in the base model itself. If your custom data is not out of distribution, and if you don’t have a huge amount of it, and of high quality, the risk is that it will simply be “lost” among the billions of models’ weights. Since GPT models are based on massive web crawls, they possess significant amount of specialized knowledge, across engineering, sciences, legal, and definitely finance and economics. What finetuning can do is for example change the style and format of responses (if this is tricky to steer with prompting), but in terms of adding new knowledge, it is super tricky to get this right. Even the likes of Bloomberg tried, and failed.

zainanwar6234 · November 24, 2024, 6:11pm

Just to clarify, we are going to have a specific and precise format for our queries, like defined system prompts & we will use fine tuning to train the LLM to give us outputs in the desired formats that we want. However, we intend to use as our prompt and output examples in the JSONL files new information that one-to-one corresponds to the prompt and answers that the user will be giving and getting. I hope this is clear and if so the question is: Do you think this approach is likely to enable us to generate the answers we want on the new information that we have provided in the training?

platypus · November 24, 2024, 6:21pm

Ok I see. My answer is: possibly ! I don’t dare give a more precise answer than that - lot of this is just pure alchemy, and you just have to try it out and evaluate. If you are sure prompting strategies don’t get you there, and you have plenty of finetuning samples (thousands of samples), then why not?

Topic		Replies	Views
Training and Increasing Knowledge base of model API	8	2217	February 25, 2024
Candidate for fine-tuning: mapping complex user input to tightly-bound JSON Prompting fine-tuning	5	162	August 5, 2024
Fine Tuning Help defining Prompt/Completion API	17	2369	March 31, 2023
Fine tuned model not giving expected responses even after mentioning model id API fine-tuning , fine-tuning-problems	5	876	April 2, 2024
Fine tuning using a corpus API api	8	2054	July 13, 2023

What is the best way of getting OpenAI API to respond with more specific & statistical responses related to Financial Markets?

Related topics