Alaryngeal Speech Generation Using MaskCycleGAN-VC and Timbre-Enhanced Loss

Conference proceedings article


Authors/Editors


Strategic Research Themes


Publication Details

Author listHnin Yadana Lwin, Wuttipong Kumwilaisak, Chatchawarn Hansakunbuntheung, Nattanun ThatphiThakkul

Publication year2023

Start page1

End page5

Number of pages5

URLhttps://dl.acm.org/doi/10.1145/3628454.3631582


View on publisher site


Abstract

This paper introduces a data augmentation technique for alaryngeal speech using voice conversion within the MaskCycleGAN-VC framework [6]. Our method leverages two masking techniques: Articulatory Dimension Masking (ADM) and the combination of ADM with Consecutive Time Masking (CTM), called SpecAugment[11]. The initial technique used for masking within the MaskCycleGANVC framework is CTM, and our proposed additional masking techniques enhance the quality and performance of voice conversion for alaryngeal speech. We can also expand the variability of voice characteristics within the converted alaryngeal speech dataset. One notable enhancement in our approach is incorporating a timbre similarity score into the generator loss, known as the Timbre Enhanced Loss. This score dynamically guides the conversion process to prioritize preserving timbral characteristics during voice transformation. From our experiments using different objective metrics, the proposed method can provide synthesized alaryngeal speeches having characteristics close to the actual ones.


Keywords

No matching items found.


Last updated on 2024-05-02 at 23:07