How do AI scribes for healthcare work?

MD here. AI scribes are flooding the healthcare sector. Can anyone explain how these AI scribes work? My understanding is that a transcribing software will transcribe the conversation and then the GPT will summarize the note based on the prompt. Are all these companies using the same formula? What differentiates one AI scribe from another? Is it fair to assume that most companies are building the AI scribe on Microsoft Azure as they offer a BAA?

Speech β†’ Whisper β†’ Text (Anyone can do that)

Speech β†’ Whisper β†’ GPT-4 β†’ Text (This can do a lot more than just summarize)

Speech β†’ Diarization β†’ Whisper β†’ GPT-4 β†’ Text (Lots of different possibilities here)


Sales call β†’ Diarization β†’ Whisper β†’ GPT-4 (Verifying if certain things were said) β†’ output (Adds a lot of value)


Thanks for breaking it down. What is difficulty level/time cost for creating a service like this? Seems fairly straight forward for someone with the technical expertise (not me).

Code should take less than an hour if you use something like Apps Script or Streamlit (Thanks to GPT-4)

Prompts might take minutes to weeks.

Assuming you want to create a prototype or something for personal use.


You should be able to make a prototype of something like this fairly quickly, but I’d say at least 3 months or so to make something that can stand on it’s own legs.

There’s lots of different services out there doing this, and not all of them are equally legitimate, as stuff like this does require a lot of legal paperwork :sweat_smile:

I’ll call in my buddy @Foxalabs, as he knows more about this stuff than I do.


As a builder in the space since 2019 with Heidi.

It’s easy now to make a toy prototype it’s fairly hard from a product pov to make it truly exceptional and good enough for an every-visit proposition.

Things like templates, personalisation, macros, apps etc. all required to make it viable.