Alaryngeal Speech Generation Using MaskCycleGAN-VC and Timbre-Enhanced Loss
Conference proceedings article
Authors/Editors
Strategic Research Themes
Publication Details
Author list: Hnin Yadana Lwin, Wuttipong Kumwilaisak, Chatchawarn Hansakunbuntheung, Nattanun ThatphiThakkul
Publication year: 2023
Start page: 1
End page: 5
Number of pages: 5
URL: https://dl.acm.org/doi/10.1145/3628454.3631582
Abstract
This paper introduces a data augmentation technique for alaryngeal speech using voice conversion within the MaskCycleGAN-VC framework [6]. Our method leverages two masking techniques: Articulatory Dimension Masking (ADM) and the combination of ADM with Consecutive Time Masking (CTM), called SpecAugment[11]. The initial technique used for masking within the MaskCycleGANVC framework is CTM, and our proposed additional masking techniques enhance the quality and performance of voice conversion for alaryngeal speech. We can also expand the variability of voice characteristics within the converted alaryngeal speech dataset. One notable enhancement in our approach is incorporating a timbre similarity score into the generator loss, known as the Timbre Enhanced Loss. This score dynamically guides the conversion process to prioritize preserving timbral characteristics during voice transformation. From our experiments using different objective metrics, the proposed method can provide synthesized alaryngeal speeches having characteristics close to the actual ones.
Keywords
No matching items found.