Alaryngeal Speech Generation Using MaskCycleGAN-VC and Timbre-Enhanced Loss

Conference proceedings article

ผู้เขียน/บรรณาธิการ

วุฒิพงษ์ คำวิลัยศักดิ์

กลุ่มสาขาการวิจัยเชิงกลยุทธ์

การเปลี่ยนแปลงด้วยเทคโนโลยีดิจิตอล (รูปแบบการวิจัยเชิงกลยุทธ์)

รายละเอียดสำหรับงานพิมพ์

รายชื่อผู้แต่ง: Hnin Yadana Lwin, Wuttipong Kumwilaisak, Chatchawarn Hansakunbuntheung, Nattanun ThatphiThakkul

ปีที่เผยแพร่ (ค.ศ.): 2023

หน้าแรก: 1

หน้าสุดท้าย: 5

จำนวนหน้า: 5

URL: https://dl.acm.org/doi/10.1145/3628454.3631582

ดูบนเว็บไซต์ของสำนักพิมพ์

บทคัดย่อ

This paper introduces a data augmentation technique for alaryngeal speech using voice conversion within the MaskCycleGAN-VC framework [6]. Our method leverages two masking techniques: Articulatory Dimension Masking (ADM) and the combination of ADM with Consecutive Time Masking (CTM), called SpecAugment[11]. The initial technique used for masking within the MaskCycleGANVC framework is CTM, and our proposed additional masking techniques enhance the quality and performance of voice conversion for alaryngeal speech. We can also expand the variability of voice characteristics within the converted alaryngeal speech dataset. One notable enhancement in our approach is incorporating a timbre similarity score into the generator loss, known as the Timbre Enhanced Loss. This score dynamically guides the conversion process to prioritize preserving timbral characteristics during voice transformation. From our experiments using different objective metrics, the proposed method can provide synthesized alaryngeal speeches having characteristics close to the actual ones.

คำสำคัญ

ไม่พบข้อมูลที่เกี่ยวข้อง