Need Help with API / Not Sure to Use Fine Tuning or Not

ChipG · August 26, 2024, 11:33pm

Hello!

I need to be able to upload company training docs then have them cross referenced to transcripts of conversations.

I do not want to have to pay for input tokens each time, it’ll be a high volume of transcriptions and docs, how would I do this?

Would fine-tuning work? and if so, I only pay for the tokens to upload the docs 1 time?

Or is there a better way to go about this?

Thank you!

dignity_for_all · August 27, 2024, 12:22am

I believe that the training documents to be transcribed are image files.

Fine tuning cannot be used to reduce the cost of transcribing image files with vision features.

When transcribing image files, the fees listed in OpenAI’s pricing apply.

That being said, if you save the transcriptions in advance, you don’t have to run the transcription process multiple times; saving once is sufficient.

It may be difficult to extract only the text from images. In such cases, some post-processing may be necessary.

ChipG · August 27, 2024, 12:24am

Image files? Are you an automated AI bot? No, they are not image files, it’s plain text in pdf, doc, txt. Where are you getting image files from?

dignity_for_all · August 27, 2024, 12:30am

I’m sorry, I overlooked the mention of “conversation.”

If you use Whisper for transcription, you can host it yourself as an open-source model, which means you would only incur the hosting costs.

ChipG · August 27, 2024, 12:33am

NO, we already have the conversations transcribed…

We want to be able to upload a large company training document in pdf or txt and have it checked against the phone call transcriptions we already have also in txt without having to pay and upload the large company training doc for every transcription.

Please read my post.

Hey everyone, I’m new here but do they have bots that try to answer these questions?

dignity_for_all · August 27, 2024, 12:43am

If both the company training document in PDF or TXT and the phone call transcriptions you already have in TXT are text files, the process required for cross-referencing may depend on the specific needs, and approaches like RAG may need to be considered.

Additionally, by using tools, it is possible to cache the inputs and outputs of the LLM, which could lead to cost savings.

Topic		Replies	Views
Seeking Advice: Uploading Large PDFs for Analysis with GPT-3 API API gpt-35-turbo , chatgpt , fine-tuning , api	7	6998	December 13, 2023
Best Practice to save money on Calling Assistant API API gpt-4 , api	3	968	November 24, 2023
Are we repeatedly charged for all tokens in the context window? API	4	433	May 30, 2024
How to train the API using like 100 documents (docx, xlsx, pptx, pdf) API	4	2655	December 19, 2024
How to reduce file_search token count API gpt-4 , api , assistants-api	1	556	April 29, 2024

Need Help with API / Not Sure to Use Fine Tuning or Not

Related topics