Welcome to Bio-RAMP Lab!

Biomedical Research in Artificial Intelligence and Machine Perception (Bio-RAMP) is a global multidisciplinary research community at the intersection of healthcare and artificial intelligence. Our goal is to provide a platform for researchers across the globe to contribute to the development and application of AI in the field of medicine. Our lab is committed to increasing the participation and engagement of researchers and communities that are currently underrepresented in this field. We believe that the diversity of perspectives and experiences will lead to more inclusive and impactful solutions.

Focus Areas

Papers

Accepted

1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis

Abstract

the first pan-African accented English speech synthesis system able to generate speech in 75 African accents, with 1000 personas representing the rich phonological diversity across the continent

Recent advances in speech synthesis have enabled many useful applications like audio directions in Google Maps, screen readers, and automated content generation on platforms like Tik-tok. However, these systems are mostly dominated by voices sourced from data-rich geographies with personas rep- resentative of their source data. Although 3000 of the world’s languages are domiciled in Africa, African voices and personas are under-represented in these systems. As speech synthesis be- comes increasingly democratized, it is desirable to increase the representation of African English accents. We present Afro- TTS, the first pan-African accented English speech synthesis system able to generate speech in 75 African accents, with 1000 personas representing the rich phonological diversity across the continent for downstream application in Education, Public Health, and Automated Content Creation. Speaker interpola- tion retains naturalness and accentedness, enabling the creation of new voices.

Accepted

Performant Medical Named Entity Recognition from Accented Speech

Abstract

Despite some models achieving low overall Word Error Rates (WER), errors in clinical entities are higher, potentially posing substantial risks to patient safety.

Recent strides in automatic speech recognition (ASR) have accelerated their application in the medical domain where their performance on accented medical named-entities (NE) such as drug names, diagnoses, and lab results, is largely unknown. We rigorously evaluate multiple ASR models on a clinical English dataset of 120 African accents. Our analysis reveals that despite some models achieving low overall Word Error Rates (WER), errors in clinical entities are higher, potentially posing substantial risks to patient safety. To empirically demonstrate this, we extract clinical entities from transcripts, align ASR predictions with these entities, and compute Medical NE Recall, Medical WER, and Character Error Rate, novel metrics specifically developed to assess medical NE Recognition performance in speech. We demonstrate that finetuning on accented clinical speech improves Medical-WER by a wide margin (36% absolute), improving their practical applicability in healthcare environments

Accepted

AccentFold: A Linguistic Journey through African Accents for Zero-Shot ASR Adaptation to Target Accents

Abstract

a method that exploits spatial relationships between learned accent embeddings to improve downstream Automatic Speech Recognition (ASR).

Despite advancements in speech recognition, accented speech remains challenging. While previous approaches have focused on modeling techniques or creating accented speech datasets, gathering sufficient data for the multitude of accents, particularly in the African context, remains impractical due to their sheer diversity and associated budget constraints. To address these challenges, we propose AccentFold, a method that exploits spatial relationships between learned accent embeddings to improve downstream Automatic Speech Recognition (ASR). Our exploratory analysis of speech embeddings representing 100+ African accents reveals interesting spatial accent relationships highlighting geographic and genealogical similarities, capturing consistent phonological, and morphological regularities, all learned empirically from speech. Furthermore, we discover accent relationships previously uncharacterized by the Ethnologue. Through empirical evaluation, we demonstrate the effectiveness of AccentFold by showing that, for out-ofdistribution (OOD) accents, sampling accent subsets for training based on AccentFold information outperforms strong baselines with a relative WER improvement of 4.6%. AccentFold presents a promising approach for improving ASR performance on accented speech, particularly in the context of African accents, where data scarcity and budget constraints pose significant challenges. Our findings emphasize the potential of leveraging linguistic relationships to improve zero-shot ASR adaptation to target accents. Please find our code for this work here.1

Researchers

Tobi Olatunji MD

Computer Science
MD, MSc

Get in touch with us

Are you interested in becoming a part of a community of researchers working at the intersection of healthcare and machine learning? Are you eager to publish or build projects in this area? Join us!

Biweekly Meetings

Saturdays at 12pm EST, 5pm WAT, 7pm SAST, 9.30pm IST

Sign Up

Register using this Form

Join the community

Drop us a message

We will get back to you as soon as possible.

Our Partners

Working with us to advance the field of Machine Learning
for Healthcare and expand access to deep learning applications in medicine.