Need Help with API / Not Sure to Use Fine Tuning or Not

Hello!

I need to be able to upload company training docs then have them cross referenced to transcripts of conversations.

I do not want to have to pay for input tokens each time, it’ll be a high volume of transcriptions and docs, how would I do this?

Would fine-tuning work? and if so, I only pay for the tokens to upload the docs 1 time?

Or is there a better way to go about this?

Thank you! :slight_smile:

I believe that the training documents to be transcribed are image files.

Fine tuning cannot be used to reduce the cost of transcribing image files with vision features.

When transcribing image files, the fees listed in OpenAI’s pricing apply.

That being said, if you save the transcriptions in advance, you don’t have to run the transcription process multiple times; saving once is sufficient.

It may be difficult to extract only the text from images. In such cases, some post-processing may be necessary.

Image files? Are you an automated AI bot? No, they are not image files, it’s plain text in pdf, doc, txt. Where are you getting image files from?

I’m sorry, I overlooked the mention of “conversation.”

If you use Whisper for transcription, you can host it yourself as an open-source model, which means you would only incur the hosting costs.

NO, we already have the conversations transcribed…

We want to be able to upload a large company training document in pdf or txt and have it checked against the phone call transcriptions we already have also in txt without having to pay and upload the large company training doc for every transcription.

Please read my post.

Hey everyone, I’m new here but do they have bots that try to answer these questions?

If both the company training document in PDF or TXT and the phone call transcriptions you already have in TXT are text files, the process required for cross-referencing may depend on the specific needs, and approaches like RAG may need to be considered.

Additionally, by using tools, it is possible to cache the inputs and outputs of the LLM, which could lead to cost savings.