Skip to main content

Nvidia researchers detail AI-powered clinical speech transcription system

Image of doctors discussing while looking at tablets filled with medical data and patient records.
Image Credit: Shutterstock / Uber images

Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here.


At the Conference for Machine Intelligence in Medical Imaging 2020, which was held virtually this year, Nvidia researchers presented a paper describing an AI system that captures and transcribes clinical patients’ speech. The system identifies clinical words and maps the words in a standardized health database, tasks the researchers say could alleviate pressure on clinicians as they experience pandemic-related overwork.

The coauthors suggest telemedicine as one potential use of the system, a field that has seen unprecedented demand during the coronavirus pandemic. In March, virtual health consultations grew by 50%, according to Frost and Sullivan research, with general online medical visits on course to hit 200 million this year.

At the core of the researchers’ system is a BERT-based language model pretrained in a self-supervised manner on a text dataset. (Self-supervised learning is a means of training models to perform tasks without providing labeled data.) Bio-Megatron, a model with 345 million parameters — configuration variables internal to the model — ingested and learned patterns from 6.1 billion words extracted from PubMed, a search engine for abstracts on life sciences topics.

After pretraining, the model was fine-tuned on a clinical natural language processing dataset created by a former National Institutes of Health (NIH)-funded National Center for Biomedical Computing agreement. Then, it was incorporated into an automatic speech recognition component that performs word identification and checks words against concepts in the Unified Medical Language System (UMLS), an ontology developed by the NIH’s National Library of Medicine.

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.
Request an invite

In experiments running on Nvidia V100 and T4 graphics cards, the researchers report that Bio-Megatron achieved 92.05% accuracy after 1 millisecond of processing when taking into account precision and recall. “This opens significant new capabilities in systems where responsiveness to patients, clinicians, and researchers is paramount … An automatic speech recognition model that can extract and relate key clinical concepts from clinical conversations can be very useful,” they wrote. “We hope our contribution will help achieve faster and better patient responses, ultimately leading to improved patient care.”

Nvidia’s contribution to the research community comes after Microsoft coauthors proposed a ‘state-of-the-art’ biomedical language model dubbed PubMedBERT. They claimed they managed industry-leading results on tasks including named entity recognition, evidence-based medical information extraction, document classification, and more.

VB Daily - get the latest in your inbox

Thanks for subscribing. Check out more VB newsletters here.

An error occured.