I wrote a transcribe mp3 → text custom action. I was successful (after much gnashing of teeth) to get it working (e.g.: on a simple GET \
) On transcribing, I wanted to use the upload button on the Chat interface to upload the mp3 to transcribe. Then I can say “transcribe this mp3 file I just uploaded”. This does not seem to be allowed. This seems weird to me because during a normal chat session i can merrily upload files… I guess I could work with URLs but just uploading the mp3 by a button clicked seemed the most elegant way. Am I missing something in how File upload and use works? Thank you.
It sounds like you are talking about a GPT in ChatGPT.
However, you can refer to the supported files for API’s “assistants”.
mp3: not there.
An upload is placed into the retrieval system for augmenting the AI knowledge, or placed into the python sandbox. Neither of those locations allow the AI to upload the file somewhere.
The AI language cannot recite a large binary to an action.
addl:
{
“accepted_mime_types”: [
“application/vnd.openxmlformats-officedocument.wordprocessingml.document”,
“text/x-script.python”,
“text/x-csharp”,
“text/x-sh”,
“text/markdown”,
“text/x-java”,
“application/pdf”,
“text/plain”,
“text/x-typescript”,
“text/javascript”,
“text/x-c”,
“text/html”,
“application/x-latext”,
“text/x-php”,
“text/x-tex”,
“application/msword”,
“application/vnd.openxmlformats-officedocument.presentationml.presentation”,
“text/x-ruby”,
“application/json”,
“text/x-c++”
]
}
Thank you for the reply. I don’t understand your answer. I have an openapi schema for my GPT action:
openapi: 3.1.0
info:
title: Tim, The Audio To Text Service
version: 0.0.1
servers:
- url: https://odd-plants-lay.loca.lt
paths:
/:
get:
summary: Root Endpoint
description: Returns a greeting message from the server.
operationId: root
responses:
'200':
description: A JSON object with a greeting message.
content:
application/json:
schema:
type: object
properties:
message:
type: string
example: "Hello from Tim!"
/transcribe/mp3:
post:
summary: Transcribe MP3 file
operationId: transcribeAudio
requestBody:
required: true
content:
multipart/form-data:
schema:
type: object
properties:
file:
type: string
format: binary
description: MP3 file to be transcribed.
model_name:
type: string
description: Key corresponding to the transcription model to use.
enum:
- tiny
- tiny.en
- base
- base.en
- small
- small.en
- medium
- medium.en
- large
- large-v2
compute_type:
type: string
description: Key indicating the compute type for the transcription process.
enum:
- default
- float16
- float32
required:
- file
responses:
'200':
description: Transcription initiated successfully.
content:
application/json:
schema:
type: object
properties:
task_id:
type: string
description: Unique identifier for the transcription task.
message:
type: string
description: Confirmation message.
'400':
description: Invalid request parameters.
/status/{task_id}/stream:
get:
summary: Get updates for a transcription task via SSE
operationId: getTranscriptionStatus
parameters:
- in: path
name: task_id
required: true
schema:
type: string
description: Unique identifier for the transcription task.
responses:
'200':
description: Stream of status updates.
content:
text/event-stream:
schema:
type: string
'404':
description: Task not found.
/download/{taskid}:
get:
summary: Download a transcription file
operationId: downloadTranscriptionFile
parameters:
- in: path
name: filename
required: true
schema:
type: string
description: The name of the file to download.
responses:
'200':
description: The transcription file is returned.
content:
application/octet-stream:
schema:
type: string
format: binary
headers:
Content-Disposition:
schema:
type: string
description: Indicates the filename that the downloaded file should have.
'404':
description: File not found.
content:
application/json:
schema:
type: object
properties:
error:
type: string
example: "File not found"
When I ask for a transcript from my custom GPT with an action based on this openapi spec, I want the custom GPT to get to the mp3 file by me clicking on the paperclip button like i do in “normal” chats. Only this time, take the file and use the transcribe endpoint to start the download using the file parameter to be the mp3 file i uploaded with the paperclip button (again, like i do for any other conversation).? Thank you.
The AI cannot receive an MP3 into context or move around binary files.
It can produce language.
Uploaded files are provided to specific internal services that can then be employed by the AI.
Is there any point in the AI knowing the text that has been transcribed?
You can just link to your web site that gives free AI services at your expense like you describe within a GPT.
yes. The code doesn’t show it (incremental building), the next step is to “translate” the transcription into a format that is more like a scientific article a 12th grader would be comfortable with. The transcripts are from a scientific podcast on growing plants. The goal is to produce an article that takes the conversational tone of the podcast and turns it into more of a page of content that takes the ideas and facts postulated in the podcast into an article on a specific area of growing plants (as covered in the podcast/transcript). The scenario is a new podcast comes in. I ask for transcription then “translation”, I wanted to do this as a custom GPT because I am a GPT Plus user and it enhances my GPT plus experience. Currently I am cobbling each step together. I wanted a custom API to focus and do this task with my apis to help. Thank you.
I then take the “translation” and fit it into my “knowledge bank” which is an obsidian vault. I then index the contents of the obsidian vault and do RAG over it…
Also, i wanted to better understand the whole FastAPI + uvicorn + SSE + open api as well as limitations of writing a custom action GPT. So far, the biggest limitation (besides file upload - which sadly works for chatgpt but not my custom action GPT) is debugging. Oh my goodness! There is no real debugging support other than showing [debug…] this is a serious challenge.