Alaryngeal Speech Generation Using MaskCycleGAN-VC and Timbre-Enhanced Loss

Conference proceedings article

Authors/Editors

WUTTIPONG KUMWILAISAK

Strategic Research Themes

Digital Transformation (Strategic Research Themes)

Publication Details

Author list: Hnin Yadana Lwin, Wuttipong Kumwilaisak, Chatchawarn Hansakunbuntheung, Nattanun ThatphiThakkul

Publication year: 2023

Start page: 1

End page: 5

Number of pages: 5

URL: https://dl.acm.org/doi/10.1145/3628454.3631582

View on publisher site

Abstract

This paper introduces a data augmentation technique for alaryngeal speech using voice conversion within the MaskCycleGAN-VC framework [6]. Our method leverages two masking techniques: Articulatory Dimension Masking (ADM) and the combination of ADM with Consecutive Time Masking (CTM), called SpecAugment[11]. The initial technique used for masking within the MaskCycleGANVC framework is CTM, and our proposed additional masking techniques enhance the quality and performance of voice conversion for alaryngeal speech. We can also expand the variability of voice characteristics within the converted alaryngeal speech dataset. One notable enhancement in our approach is incorporating a timbre similarity score into the generator loss, known as the Timbre Enhanced Loss. This score dynamically guides the conversion process to prioritize preserving timbral characteristics during voice transformation. From our experiments using different objective metrics, the proposed method can provide synthesized alaryngeal speeches having characteristics close to the actual ones.

Keywords

No matching items found.