Google Launches MedASR, an Open Medical Speech-to-Text Model

In late 2025, Google released MedASR, an open-weight, medical-focused speech-to-text model, as part of its Health AI Developer Foundations program.

Unlike general-purpose automatic speech recognition (ASR) models, MedASR is trained specifically on medical audio using approximately 5,000 hours of de-identified data that includes physician dictations and clinical conversations across multiple specialties such as radiology, internal medicine, and family medicine.

According to Google, MedASR is intended for use in the healthcare and life sciences sector and is positioned as a “foundational model” for “efficient healthcare-based voice applications.” The company recommends it for medical dictation tasks involving specialized terminology as well as for transcribing physician-patient conversations.

The model is relatively lightweight and can be run locally, deployed via Google Vertex AI, or fine-tuned on proprietary data.

“While it performs well on general medical dictation, it can be fine-tuned to adapt to specific requirements that may fall outside its pre-training data,” Google said.

The company highlights customization options including accent adaptation, robustness to noisy audio, medical vocabulary expansion, and improved formatting, such as consistent handling of dates, times, and durations. Google also provides open notebooks for quick-start usage and fine-tuning.

Integration With Downstream AI Workflows

Beyond standalone transcription, Google positions MedASR as a speech-to-text layer within broader, multimodal healthcare AI workflows. By converting spoken clinical audio into text, MedASR enables downstream integration with large language models (LLMs) for documentation and summarization tasks.

In practice, MedASR can be used to transcribe a recorded patient visit or clinician dictation, after which a model such as MedGemma can use the transcript to draft clinician notes – including SOAP (Subjective, Objective, Assessment, Plan) notes — or summarize key symptoms and medications.

“MedASR allows developers to incorporate automatic speech recognition (ASR) capabilities, specifically tuned for the medical domain into their product,” Google said.

In published evaluations, Google reports that MedASR delivers significantly lower transcription error rates than widely used models such as OpenAI’s Whisper and even Google’s own Gemini 2.5 Pro and Gemini 2.5 Flash.

However, Google stresses that MedASR is not intended to be used without appropriate validation or adaptation. The company notes that outputs may contain transcription errors and should not be used directly for clinical diagnosis, patient management decisions, or treatment recommendations.

All outputs generated by MedASR are described as “preliminary” and requiring “independent verification, clinical correlation, and further investigation,” reflecting the growing regulatory scrutiny around AI transcription tools in healthcare settings.

Hyperscalers Push into Healthcare Voice AI

Google is not the only major technology provider investing in healthcare voice AI.

Microsoft has long positioned Dragon Medical One as a core product for “speech-driven clinical documentation,” with features like dictation and automatic accent detection. More recently, Microsoft has expanded this with Dragon Copilot, a voice AI assistant which also includes translation as a feature (TaaF).

OpenAI, meanwhile, is also pushing into healthcare voice and language AI. On January 8, 2026, the company introduced OpenAI for Healthcare, highlighting how teams can use ChatGPT for Healthcare to “draft clinical and administrative documentation” and “adapt patient-facing […] materials for readability and translation.”Shortly before that, the company also introduced ChatGPT Health which supports connecting medical records and wellness apps and includes features such as voice mode and dictation in health conversations.

Featured