Under Review
Advancing African Accented Clinical Speech Recognition with Generative and Discriminative Multitask Supervision
- 24th INTERSPEECH Conference (Interspeech '23)
Abstract
The recent emergence of large pretrained ASR models has facilitated multiple transfer learning and domain adaptation efforts, in which performant general-purpose ASR models are fine-tuned for specific domains, such as clinical or accented speech. However, African accented clinical speech recognition remains largely unexplored. We propose a semantically aligned, domain-specific multitask learning framework (generative and discriminative) and demonstrate empirically that semantically aligned, multitask learning enhances ASR, outperforming the single-task architecture by 2.5% (relative). We discover that the generative multitask design improves generalization to unseen accents, while the discriminative multitask approach improves clinical ASR for majority and minority accents.