Using the Paperclip button Uploading file to a custom action

I wrote a transcribe mp3 → text custom action. I was successful (after much gnashing of teeth) to get it working (e.g.: on a simple GET \) On transcribing, I wanted to use the upload button on the Chat interface to upload the mp3 to transcribe. Then I can say “transcribe this mp3 file I just uploaded”. This does not seem to be allowed. This seems weird to me because during a normal chat session i can merrily upload files… I guess I could work with URLs but just uploading the mp3 by a button clicked seemed the most elegant way. Am I missing something in how File upload and use works? Thank you.

It sounds like you are talking about a GPT in ChatGPT.

However, you can refer to the supported files for API’s “assistants”.

mp3: not there.

An upload is placed into the retrieval system for augmenting the AI knowledge, or placed into the python sandbox. Neither of those locations allow the AI to upload the file somewhere.

The AI language cannot recite a large binary to an action.

addl:

{
“accepted_mime_types”: [
“application/vnd.openxmlformats-officedocument.wordprocessingml.document”,
“text/x-script.python”,
“text/x-csharp”,
“text/x-sh”,
“text/markdown”,
“text/x-java”,
“application/pdf”,
“text/plain”,
“text/x-typescript”,
“text/javascript”,
“text/x-c”,
“text/html”,
“application/x-latext”,
“text/x-php”,
“text/x-tex”,
“application/msword”,
“application/vnd.openxmlformats-officedocument.presentationml.presentation”,
“text/x-ruby”,
“application/json”,
“text/x-c++”
]
}

Thank you for the reply. I don’t understand your answer. I have an openapi schema for my GPT action:

openapi: 3.1.0
info:
  title: Tim, The Audio To Text Service
  version: 0.0.1
servers:
  - url:  https://odd-plants-lay.loca.lt
paths:
  /:
    get:
      summary: Root Endpoint
      description: Returns a greeting message from the server.
      operationId: root
      responses:
        '200':
          description: A JSON object with a greeting message.
          content:
            application/json:
              schema:
                type: object
                properties:
                  message:
                    type: string
                    example: "Hello from Tim!"
  /transcribe/mp3:
    post:
      summary: Transcribe MP3 file
      operationId: transcribeAudio
      requestBody:
        required: true
        content:
          multipart/form-data:
            schema:
              type: object
              properties:
                file:
                  type: string
                  format: binary
                  description: MP3 file to be transcribed.
                model_name:
                  type: string
                  description: Key corresponding to the transcription model to use.
                  enum:
                    - tiny
                    - tiny.en
                    - base
                    - base.en
                    - small
                    - small.en
                    - medium
                    - medium.en
                    - large
                    - large-v2
                compute_type:
                  type: string
                  description: Key indicating the compute type for the transcription process.
                  enum:
                    - default
                    - float16
                    - float32
              required:
                - file
      responses:
        '200':
          description: Transcription initiated successfully.
          content:
            application/json:
              schema:
                type: object
                properties:
                  task_id:
                    type: string
                    description: Unique identifier for the transcription task.
                  message:
                    type: string
                    description: Confirmation message.
        '400':
          description: Invalid request parameters.
  /status/{task_id}/stream:
    get:
      summary: Get updates for a transcription task via SSE
      operationId: getTranscriptionStatus
      parameters:
        - in: path
          name: task_id
          required: true
          schema:
            type: string
          description: Unique identifier for the transcription task.
      responses:
        '200':
          description: Stream of status updates.
          content:
            text/event-stream:
              schema:
                type: string
        '404':
          description: Task not found.
  /download/{taskid}:
    get:
      summary: Download a transcription file
      operationId: downloadTranscriptionFile
      parameters:
        - in: path
          name: filename
          required: true
          schema:
            type: string
          description: The name of the file to download.
      responses:
        '200':
          description: The transcription file is returned.
          content:
            application/octet-stream:
              schema:
                type: string
                format: binary
          headers:
            Content-Disposition:
              schema:
                type: string
              description: Indicates the filename that the downloaded file should have.
        '404':
          description: File not found.
          content:
            application/json:
              schema:
                type: object
                properties:
                  error:
                    type: string
                    example: "File not found"

When I ask for a transcript from my custom GPT with an action based on this openapi spec, I want the custom GPT to get to the mp3 file by me clicking on the paperclip button like i do in “normal” chats. Only this time, take the file and use the transcribe endpoint to start the download using the file parameter to be the mp3 file i uploaded with the paperclip button (again, like i do for any other conversation).? Thank you.

The AI cannot receive an MP3 into context or move around binary files.

It can produce language.

Uploaded files are provided to specific internal services that can then be employed by the AI.

image

Is there any point in the AI knowing the text that has been transcribed?

You can just link to your web site that gives free AI services at your expense like you describe within a GPT.

yes. The code doesn’t show it (incremental building), the next step is to “translate” the transcription into a format that is more like a scientific article a 12th grader would be comfortable with. The transcripts are from a scientific podcast on growing plants. The goal is to produce an article that takes the conversational tone of the podcast and turns it into more of a page of content that takes the ideas and facts postulated in the podcast into an article on a specific area of growing plants (as covered in the podcast/transcript). The scenario is a new podcast comes in. I ask for transcription then “translation”, I wanted to do this as a custom GPT because I am a GPT Plus user and it enhances my GPT plus experience. Currently I am cobbling each step together. I wanted a custom API to focus and do this task with my apis to help. Thank you.

I then take the “translation” and fit it into my “knowledge bank” which is an obsidian vault. I then index the contents of the obsidian vault and do RAG over it…

Also, i wanted to better understand the whole FastAPI + uvicorn + SSE + open api as well as limitations of writing a custom action GPT. So far, the biggest limitation (besides file upload - which sadly works for chatgpt but not my custom action GPT) is debugging. Oh my goodness! There is no real debugging support other than showing [debug…] this is a serious challenge.